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Foreword 



I feel privileged that the l(f h Advances in Computer Games Conference 
(ACG 10) takes place in Graz, Styria, Austria. It is the first time that Austria 
acts as host country for this major event. The series of conferences started in 
Edinburgh, Scotland in 1975 and was then held four times in England, three 
times in The Netherlands, and once in Germany. The ACG-10 conference in 
Graz is special in that it is organised together with the 11 th World Computer- 
Chess Championship (WCCC), the 8 th Computer Olympiad (CO), and the 
European Union Youth Chess Championship. 

The 1 1 th WCCC and ACG 10 take place in the Dorn im Berg (Dome in the 
Mountain), a high-tech space with multimedia equipment, located in the 
Schlossberg, in the centre of the city. The help of many sponsors (large and 
small) is gratefully acknowledged. They will make the organisation of this 
conference a success. In particular, I would like to thank the European Union 
for designating Graz as the Cultural Capital of Europe 2003. There are 24 
accepted contributions by participants from all over the world: Europe, Japan, 
USA, and Canada. The specific research results of the ACG 10 are expected to 
find their way to general applications. The results are described in the pages 
that follow. The international stature together with the technical importance of 
this conference reaffirms the mandate of the International Computer Games 
Association (ICGA) to represent the computer-games community. This is 
important when negotiating with FIDE or other representative bodies of game 
competitions on the organisation of a match against their domain-specific 
human World Champion. Moreover, the ICGA is the right organisation to 
represent the same community to the European Union to have the next series 
of events (WCCC, CO, ACG) organised in the framework of the Cultural 
Capital of Europe. I would hope that Graz is the start of such a trend. I am 
convinced that our city will do its utmost to let the participants feel at ease 
when they, for a moment, are not in the brain-teasing theories and experiments 
of their brainchilds. In summary, I wish you a good time in Graz. 



Kurt Jungwirth 

Org anising Chair of the ACG 10 in Graz 



September 2003 




Preface 



This book is the tenth in a well-established series originally describing 
the progress of computer-chess research. The book contains the papers of the 
10 th international conference Advances in Computer Games (ACG), to be 
hosted by the city of Graz (Styria, Austria), the Cultural Capital of Europe 
2003. The conference will take place from November 24 to 27, 2003 during 
the 11 th World Computer-Chess Championship (WCCC) and the 8 th 
Computer Olympiad, which will be held simultaneously in Graz. The 
combination of the three events is expected to be a great success since it 
offers: science, competition, and top sport (in the domain of computer 
chess). It is the first time that the three events coincide. For Graz it is very 
fortunate that the ICGA (International Computer Games Association) 
decided in its Triennial Meeting in Maastricht 2002 to have the WCCC 
annually instead of triennially. 

In the last decade of the previous century the focus of much acade m ic 
research shifted from chess to other intelligent games. Perhaps, the two 
matches Kasparov played with DEEP BLUE were instrumental for this shift. 
Whatever the reason, it is obvious that the oriental game of Go currently 
plays a considerable part in intelligent games research. The tendency is 
clearly visible in the 10 th ACG conference, where chess and Go are 
represented by an equal amount of contributions. For historical reasons we 
start with chess, still turning out to be an inexhaustible testing ground for 
new ideas. 

The book contains 24 contributions by a variety of authors from all over 
the world. We have sequenced the contributions according to the type of 
game. As stated above we start with the research domains of chess (6 papers) 
and Go (6 papers). It is followed by those of checkers (2 papers) and Lines 
of Action (2 papers). Finally, we are happy to show the broadness of the 10 th 
ACG conference by publishing another eight contributions on different 
games each. They are: Hex, Othello, Amazons, Bao, Kriegspiel, Gaps, Oshi- 
Zumo, and New Wythoff games. We hope that our readers will enjoy 
reading the efforts of the researchers, who made this development possible. 
Below we give a brief account of all contributions. 
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Chess 

Chess is a game that has set the AI research scene for almost fifty years. 
The game dominated the games developments to a large extent. Since chess 
can hardly be characterized by a limited list of research topics, we are happy 
and surprised that the topics are completely different. The six contributions 
deal with (1) evaluation functions, (2) pruning of the search, (3) search and 
knowledge, (4) pattern recognition, (5) modelling, and (6) strategies. 

In Evaluation Function Tuning via Ordinal Correlation , Dave Gomboc, 
Tony Marsland, and Michael Buro discuss the heart of any chess program: 
the evaluation function. They arrive at a metric for assessing the quality of a 
static evaluation function. Their application of ordinal correlation is 
fundamentally different from prior evaluation-function tuning techniques. 

In First Experimental Results of ProbCut Applied to Chess, Albert Xin 
Jiang and Michael Buro show that Multi-ProbCut is a technique not only 
successful in Othello and Shogi, but also in chess. The contribution discusses 
details of the implementation in the chess engine CRAFTY. The recorded 
results state that the new version wins over the original one with a 59 per- 
cent score in their test setup. 

In Search versus Knowledge: An Empirical Study of Minimax on KRK, 
Alexander Sadikov, Ivan Bratko, and Igor Kononenko return to the old 
research topic of intricacies of the precise working of the minimax 
algorithm. Their empirical experiment throws a new light on this topic. 

In Static Recognition of Potential Wins in KNNKB and KNNKN, Ernst 
Heinz investigates the possibilities of how to recognize surprisingly tricky 
mate themes in the endgames named. He analyses the mate themes and 
derives rules from them which allow for a static recognition. He shows that 
such positions occur more frequently than generally assumed. 

In Model Endgame Analysis, Guy Haworth and Rafael Andrist introduce 
a reference model of fallible endgame play. The results are compared with a 
Markov model of the endgame in question and are found to be in close 
agreement with those of the Markov model. 

In Chess Endgames: Data and Strategy, John Tamplin and Guy Haworth 
compare Nalimov’s endgame tablebases with newly created tables in which 
alternative metrics have been applied. The research is on measuring the 
differences in strategy. 

Go 

The six contributions on the game of Go relate to the following general 
topics: (1) evaluation, (2) eyes, (3) search, (4) learning, (5) Monte-Carlo Go, 
and (6) static analysis. 
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In Evaluation in Go by a Neural Network using Soft Segmentation , 
Markus Enzenberger presents a network architecture that is applied to 
position evaluation. It is trained using self-play and temporal-difference 
learning combined with a rich two-dimensional reinforcement signal. One of 
the methods is able to play at a level comparable to a 13-kyu Go program. 

In When One Eye is Sufficient: A Static Classification , Ricard Vila and 
Tristan Cazenave propose a new classification for eye shapes. The method is 
said to replace a possibly deep tree by a fast, reliable and static evaluation. 

In DF-PN in Go: An Application to the One-Eye Problem , Akihiro 
Kishimoto and Martin Muller modify the depth-first proof-number search 
algorithm and apply it to the game of Go. Subsequently, they develop a 
solver for one-eye problems. 

In Learning to Score Final Positions in the Game of Go, Erik van der 
Werf, Jaap van den Herik, and Jos Uiterwijk present a learning system that 
scores 98.9 per cent of the submitted positions correctly. Such a reliable 
scoring method opens the large source of Go knowledge and thus paves the 
way for a successful application in machine learning in Go. 

In Monte-Carlo Go Developments, Bruno Bouzy and Bernard 
Helmstetter report on the development of two Go programs OLGA and 
OLEG. The authors perform experiments to test their ideas on progressive 
pruning, temperature, and depth-two tree search within the Monte-Carlo 
framework. They conclude that such approaches are worth to be considered 
in future research. 

In Static Analysis by Incremental Computation in Go Programming, 
Katsuhiko Nakamura describes two types of analysis and pattern 
recognition. One is based on the determination of groups almost settled, the 
other on an estimation of groups of stones and territories by analysing the 
influence of stones using the “electric charge” model. 

Checkers 

Both contributions on the game of checkers focus on endgame databases. 

In Building the Checkers 10-piece Endgame Databases, Jonathan 
Schaeffer, Yngvi Bjomsson, Neil Burch, Robert Lake, Paul Lu, and Steve 
Sutphen report on their results of building large endgame databases. They 
describe actions as compression, data organisation, and real-time 
decompression. It is amazing to see that powerful techniques and machine 
power in itself are just not sufficient to crack the game. 

In The 7 -piece Perfect Play Lookup Database for the Game of Checkers, 
Edward Trice and Gilbert Dodgen examine the benefits and detriments 
associated with computing three different types of checkers endgame 
databases. They show major improvements to some previously published 
play. 




Lines of Action 

Two contributions concentrate on Lines of Action (LoA). 

In Search and Knowledge in Lines of Action, Darse Billings and Yngvi 
Bjomsson provide accurate descriptions on the design and development of 
the programs Yl and MONA. Yl emphasizes fast and efficient search, 
whereas MONA focuses on a sophisticated but relatively slow evaluation. It 
is an ideal relation for the investigation of the trade-off between search and 
knowledge. The results concur with well-known results from the chess 
world: (1) diminishing returns with additional search depth, and (2) the 
knowledge level of a program has a significant impact on the results. 

In An Evaluation Function for Lines of Action, Mark Winands, Jaap van 
den Herik, and Jos Uiterwijk, extensively describe the evaluation function 
that brought Mia IV (Maastricht In Action) its successes. The important 
elements are: concentration, centralisation, centre-of-mass position, quads, 
mobility, walls, connectedness, uniformity, and player-to-move. In the 
experiments, the evaluation function performs better at deeper searches 
showing the relevance of the components. 

Hex 

Solving 7x7 Hex: Virtual Connections and Game-State Reduction is a 
team effort by Ryan Hayward, Yngvi Bjomsson, Michael Johanson, Morgan 
Kan, Nathan Po, and Jack van Rijswijck. They develop an algorithm that 
determines the outcome of an arbitrary Hex game-state. The algorithm is 
based on the concept of a proof tree. 

Othello 

In Automated Identification of Patterns in Evaluation Functions, 
Tomoyuki Kaneko, Kazunori Yamaguchi, and Satoru Kawai propose a 
method that generates accurate evaluation functions using patterns, without 
expert players’ knowledge. The approach consists of three steps (generation 
of logical features, extracting of patterns, and selection of patterns) and is 
applied to the game of Othello. The authors report the successes of their 
method and claim that the accuracy is comparable to that of specialized 
Othello programs. 

Amazons 

In An Evaluation Function for the Game of Amazons, Jens Lieberum 
reveals the secrets of his program that won the Computer Olympiad in 
Maastricht 2002. The secret is the evaluation function. More on this topic 
can be found in the work itself. 
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Bao 

In Opponent-Model Search in Bao: Conditions for a Successful 
Application, Jeroen Donkers, Jaap van den Herik, and Jos Uiterwijk 
investigate the role of prediction and estimation. The rules of Bao are 
described and five evaluation functions are tested in tournaments. The 
domain of research is variable with respect to all kinds of versions of 
opponent modelling. The final result is that opponent-model search can be 
applied successfully, provided that the conditions are met. 

Kriegspiel 

In Computer Programming of Kriegspiel Endings: The Case of KR 
versus K, Andrea Bolognesi and Paolo Ciancarini describe the rationale and 
the design of a Kriegspiel program that plays the ending King and Rook 
versus King adequately. 

Gaps 

In Searching with Analysis of Dependencies in a Solitaire Card Game, 
Bernard Helmstetter and Tristan Cazenave present a new method of playing 
the card game Gaps. The method is an improvement of depth-first search by 
grouping several positions in a block and searching only on the boundaries 
of the blocks. 

Oshi Zumo 

In Solving the Oshi-Zumo Game, Michael Buro completes a previous 
analysis by Kotani. Buro’s Nash-optimal mixed strategies are non-trivial, but 
can be computed quickly. A discussion on ‘how good is optimal?’ concludes 
the article. 

New Wythoff Games 

In New Games Related to Old and New Sequences, Aviezri Fraenkel 
defines an infinite class of 2-pile subtraction games, where the amount that 
can be subtracted from both piles simultaneously is a function/ of the size of 
the piles. Wythoff s game is a special case. The author introduces new 
sequences. The main result is a theorem giving necessary and sufficient 
conditions on/ so that the sequences are 2 nd player winning positions. 
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EVALUATION FUNCTION TUNING VIA 
ORDINAL CORRELATION 



D. Gomboc, T. A. Marsland, M. Buro 
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Abstract Heuristic search effectiveness depends directly upon the quality of heuristic 
evaluations of states in the search space. We show why ordinal correlation is 
relevant to heuristic search, present a metric for assessing the quality of a static 
evaluation function, and apply it to learn feature weights for a computer chess 
program. 

Keywords: ordinal correlation, Kendall’s t (tau), static evaluation function, heuristic 

search, computer chess 



1. Introduction 

Inspiration for this research came while reflecting on how evaluation 
functions for today’s computer chess programs are usually developed. 
Typically, evaluation functions are refined over many years, based upon 
careful observation of their performance. During this time, engine authors 
will tweak feature weights repeatedly by hand in search of proper balance 
between terms. This ad hoc process is used because the principal way to 
measure the utility of changes to a program is to play many games against 
other programs and interpret the results. The process of evaluation function 
development would be considerably assisted by the presence of a metric that 
could reliably indicate a tuning improvement. But what would such a metric 
be like? 

The critical operation of minimax game-tree searches (Shannon, 1950) 
and all its derivatives (Marsland, 1983; Plaat, 1996) is the asking of a single 
question: is position B better than position A? Note that it is not “How much 
better?”, but simply “Is it better?”. In minimax, instead of propagating 
values one could propagate the positions instead, and, as humans do, choose 
between them directly without using values as an intermediary. 
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Consequently, we need only pairwise comparisons that tell us whether B is 
preferable to A. Plausibly, then, the metric we seek will assess how well an 
evaluation function orders positions in relation to each other, without placing 
importance on the relative differences in the values of the assessed positions 
- that is, it will be ordinal in nature. 

While at shallow depths some resemblance between positions compared 
by a minimax-based search will be evident, this does not hold true at the 
search depths typically reached today. The positions that are being compared 
are frequently completely different in character, suggesting that our mystery 
metric ought to compare pairs of positions not merely from local pockets of 
the search space but globally. 

Consideration was also given to harnessing the great deal of recorded 
experience of human chess for developing a static evaluation function. 
Researchers have tried to make their machines play designated moves from 
test positions, but we focus on judgments about the relative worth of 
positions, reasoning that if these are correct then strong moves will emerge 
as a consequence. But how does one compute a correlation between the 
(ordinal) human assessment symbols, given in Table 1, with machine 
assessments? A literature review identified that a statistical measure known 
as Kendall’s x might be exactly what is needed. 

After a brief overview of prior work 
on the automated tuning of static 
evaluation functions, we describe 
Kendall’s x, and our novel algorithm to 
implement it efficiently. We then 
discuss the materials used for our 
experiments, followed by details of our 
software implementation. Experimental 
results are provided in Section 6. After 
drawing some conclusions, we suggest 
further investigations to the interested 
researcher. 

2. Prior Work 

The precursor of modem machine learning in games is the work done by 
Samuel (1959, 1967). By fixing the value for a checker advantage, while 
letting other weights float, he iteratively tuned the weights of evaluation 

1 Two other assessment symbols, °° (the position is unclear) and s (a player has positional 
compensation for a material deficit) are also frequently encountered. Unfortunately, the 
usage of these two symbols is not consistent throughout chess literature. Accordingly, we 
ignore positions labeled with these assessments. 



symbol 


meaning 


+- 


white is winning 


± 


white has a clear advantage 


± 


white has an edge 


= 


the position is equal 




black has an edge 


+ 


black has a clear advantage 


-+ 


black is winning 



Table 1. Symbols for chess position 
assessment. 1 




Evaluation Function Tuning via Ordinal Correlation 



3 



function features so that the assessments of predecessor positions became 
more similar to the assessments of successor positions. 

Hartmann (1989) developed the “Dap Tap” to determine the relative 
influence of various evaluation feature categories, or notions, on the 
outcome of chess games. Using 62,965 positions from grandmaster 
tournament and match games, he found that “the most important notions 
yield a clear difference between winners and losers of the games”. 
Unsurprisingly, the notion of material was predominant; the combination of 
other notions contribute roughly the same proportion to the win as material 
did alone. He further concluded that the threshold for one side to possess a 
decisive advantage is 1.5 pawns. 

The Deep Thought (later Deep Blue) team applied least squares fitting 
to the moves of the winners of 868 grandmaster games to tune their 
evaluation function parameters as early as 1987 (Nowatzyk, 2000). They 
found that tuning to maximize agreement between their program’s preferred 
choice of move and the grandmaster’s was “not really the same thing” as 
playing more strongly. Amongst other interesting observations, they 
discovered that conducting deeper searches while tuning led to superior 
weight vectors being reached. 

Tesauro (1995) initially configured a neural network to represent the 
backgammon state in an efficient manner, and trained it via temporal 
difference learning (Sutton, 1988). After 300,000 self-play games, the 
program reached strong amateur level. Subsequent versions also contained 
hidden units representing specialized backgammon knowledge and used 
minimax search. TD-GAMMON is now a world-class backgammon player. 

Beal and Smith (1997) applied temporal difference learning to determine 
piece values for a chess program that included material, but not positional, 
terms. Program versions using weights resulting from five randomized self- 
play learning trials each won a match versus a sixth program version that 
used the conventional weights given in most introductory chess texts. They 
have since extended their reach to include piece-square tables for chess (Beal 
and Smith, 1999a) and piece values for Shogi (Beal and Smith, 1999b). 

Baxter, Tridgell, and Weaver (1998) applied temporal difference learning 
to the leaves of the principal variations returned by alpha-beta searches to 
learn feature weights for their program KnightCap. Through online play 
against humans, KNIGHTCAP’s skill level improved from beginner to strong 
master. The authors credit this to: the guidance given to the learner by the 
varying strength of its pool of opponents, which improved as it did; the 
exploration of the state space forced by stronger opponents who took 
advantage of KNIGHTCAP’s mistakes; the initialization of material values to 
reasonable settings, locating KNIGHTCAP’s weight vector “close in 
parameter space to many far superior parameter settings”. 
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Buro (1995) estimated feature weights by performing logistic regression 
on win/loss/draw-classified Othello positions. The underlying log-linear 
model is well suited for constmcting evaluation functions for approximating 
winning probabilities. In that application, it was also shown that the 
evaluation function based on logistic regression can perform better than 
those based on linear and quadratic discriminant functions. Later, Buro 
(1999) presented a much superior approach, using linear regression and 
positions labeled with the final disc differential to optimize the weights of 
thousands of binary pattern features. 

Kendall and Whitwell (2001) evolved intermediate-strength players from 
a population of poor players by applying crossover and mutation operators to 
generate new weight vectors, while discarding vectors that performed poorly. 

3. Kendall’s Tau 

Concordance, or agreement, occurs where items are ranked in the same 
order. Kendall's t is all about the similarities and differences in the ordering 
of ordered pairs. Consider two pairs, (xj, yj) and (x k , y k ). Compare both the x 
values and the y values. Table 2 defines the relationship between the pairs. 



relationship 
between Xi and x k 


relationship 
between and y k 


relationship between 
(Xi, y0 and (x k , y k ) 


Xi<X k 


yi<yk 


Concordant 


X;<X k 


yi>yk 


Discordant 


Xi>X k 


yi<yk 


Discordant 


Xi>X k 


yi>yk 


Concordant 


Xi = X k 


yi^yt 


extra y pair 


Xi#X k 


11 


extra x pair 


Xi = X k 


Yi = Yk 


duplicate pair 



Table 2. Relationships between ordered pairs. 

Table 3 contains a grid representing ordered pairs of machine and human 
evaluations. The value in each cell indicates the number of corresponding 
pairs; blank cells indicate that no such pairs are in the data set. Sample 
machine and human assessments are on the x and y axes, respectively. 

To compute x for a collection of ordered pairs, each ordered pair is 
compared against all other pairs. The total number of concordant pairs is 
designated S + (“S-positive”). Similarly, the total number of discordant pairs 
is designated S~ (“S-negative”). 

Consider the table cell (0.0, =). There are six entries, containing seven 
data points, located strictly below and to its left; these are concordant pairs 
and so contribute to S + . The two discordant pairs, strictly below and to its 
right, contribute to S~. We do not consider any cells from above the cell of 
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interest. If we did so, we would end up comparing each pair of ordered pairs 
twice instead of once. Finally, the 2 contained in the cell indicates that there 
are two (0.0, =) data points; hence the examination of this cell has produced 
7 * 2 = 14 concordant pairs, and 2*2 = 4 discordant pairs. 





-1.6 -1.1 -0 7 -0.6 -0.3 -01 


0.0 


0.1 0.2 0.3 0.5 03 1.3 


+- 
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1 \ 
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1 1 
1 1 


i 1 
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1 2 


> 


2 L 1 






K 


1 




1 


“+ 


* 1 




1 



Table 3. (machine, human) assessments, n = 25. 



t is given by: 

n(n- 1)/2 

The denominator equals the number of unique possible comparisons between 
any two ordered pairs from a collection of n ordered pairs. 

For the data in Table 3, S + is 162, S - is 83, and n, the number of ordered 
pairs, is 25. x equals 0.2633; we might also say that the concordance of the 
data is 0.2633. Possible concordance values range from +1, representing 
complete agreement in ordering, to -1, representing complete disagreement 
in ordering. Whenever there are extra or duplicate pairs, the values of +1 and 
-1 are not achievable. 

Cliff (1996) provides a more detailed exposition of Kendall’s x, 
discussing variations thereof that optionally disregard extra and duplicate 
pairs. Cliff labels what we call x as x a , and uses it most often, noting that it 
has the simplest interpretation of the lot. 

A straightforward implementation would perform the process illustrated 
above for each cell of the table. Our novel, algorithmically superior 
implementation allocates additional memory space, and in successive single 
passes through the data, applies dynamic programming to compute tables 
containing the number of data points that are: 

either on the same row as or below the current cell; 
either on the same column or to the right of the current cell; 
either on the same column or to the left of the current cell; 
strictly below and to the right of the current cell; 
strictly below and to the left of the current cell. 

Then, in a final pass, S + and S" are computed by multiplying the number of 
data points in the current cell by the data in the final two tables listed. It is 
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also possible to use more passes, but less memory, by performing the sweeps 
to the left and to the right serially instead of in parallel. 

There is a better-known ordinal metric in common use: Spearman’s p, 
also known as Spearman correlation. In our application, the number of 
distinct human assessments is constant. Therefore, after initial data 
processing has identified the unique machine assessments for memory 
allocation and indexing purposes, x is computed in time linear in the number 
of unique machine assessments, which is not possible for p. Prototype 
implementations confirmed that t was significantly quicker to compute for 
large data sets. 

Not only does x more directly measure what interests us (“for all pairs of 
positions (A, B), is position B better than position A?”), it is also more 
efficient to compute than plausible alternatives. Therefore, we concentrate 
on x in this paper. 

4. Chess-Related Components 

Many chess programs, or chess engines, exist. Some are commercially 
available; most are hobbyist. For our work, we selected Crafty, by Robert 
Hyatt (1996) of the University of Alabama. Crafty is the best chess engine 
choice for our work for several reasons: the source was readily available to 
us, facilitating experimentation; it is the strongest such open-source engine 
today; previous research has already been performed using Crafty. We 
worked with version 19.1 of the program. 

4.1 Training Data 

To assess the correlation of x with improved play, we used 649,698 positions 
from Chess Informant 1 through 85 (Sahovski, 1966). These volumes cover 
the important chess games played between January 1966 and September 
2002. This data set was selected because it contains a variety of assessed 
positions from modem grandmaster play, the assessments are made by 
qualified individuals, it is accessible in a non-proprietary electronic form, 
and chess players around the world are fa mi liar with it. 

We used a 32,768-position subset for the preliminary feature weight 
tuning experiments reported here. 

4.2 Test Suites 

English chess grandmaster John Nunn (1999) developed the Nunn and Nunn 
II test suites of 10 and 20 positions, respectively. They serve as starting 
positions for matches between computer chess programs, where the 
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experimenter is interested in the engine’s playing skill independent of the 
quality of its opening book. Nunn selected positions that are approximately 
balanced, commonly occur in human games, and exhibit variety of play. We 
refer to these collectively as the “Nunn 30”. 

Don Dailey, known for his work on StarSocrates and CilkChess, 
prepared a file of two hundred commonly reached positions, all of which are 
ten ply from the initial position. We refer to these collectively as the “Dailey 
200 ”. 



5. Software Implementation 

Here we detail some specifics of our implementation. We discuss both 
alterations made to CRAFTY and new software written as a platform for our 
experiments. 

5.1 Use of Floating-Point Computation 

We modified CRAFTY so that variables holding machine assessments are 
declared to be of an aliased type rather than directly as integers. This allows 
us to choose whether to use floating-point or integer arithmetic via a 
compilation switch. The use of floating-point computation provides a 
learning environment where small changes in values can be rewarded. With 
these modifications, Crafty is slower, but only by a factor of two to three 
on a typical personal computer. The experiments were performed with this 
modified version; however, all test matches were performed with the 
original, integer-based evaluation implementation. Further details can be 
found in Section 6. 

It might strike the reader as odd that we chose to alter CRAFTY in this 
manner rather than scaling up all the evaluation function weights. There are 
significant practical disadvantages to that approach. How would we know 
that everything had been scaled? It would be easy to miss some value that 
needed to be changed. How would we identify overflow issues? It might be 
necessary to switch to a larger integer type. How would we know that we 
had scaled up the values far enough? It would be frustrating to have to repeat 
the procedure. 

By contrast, the choice of converting to floating-point is safer. Precision 
and overflow are no longer concerns. Also, by setting the typedef to be a 
non-arithmetic type we can cause the compiler to emit errors wherever type 
mismatches exist. Thus, we can be more confident that our experiments rest 
upon a sound foundation. 
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5.2 Hill Climbing 

We implemented an iteration-based learner, and a hill-climbing algorithm. 
Other iteration-based algorithms may be substituted for the hill-climbing 
code if desired. Because we are not working with an analytic function, we 
measure the gradient empirically. 

We multiply V cun . en t, the current weight of a feature being tuned, by a 
number fractionally greater than one 1 to get Vhigh, except when V cummt is near 
zero, in which case a minimum distance between V cluien t and V wh is 
enforced. Vi ow is then set to be equidistant from V cun . en t, but in the other 
direction, so that V C urrent is bracketed between Vi ow and Vhigh- Two test weight 
vectors are generated: one using Vhigh, the other using V l0 w All other 
weights for these test vectors remain the same as in the base vector. This 
procedure is performed for each weight that is being tuned. For example, 
when 11 parameters are being learned, 1 + 11*2 = 23 vectors are examined 
per iteration: the base vector, and 22 test vectors. 

The three computed concordances related to a weight being tuned (T current , 
Tiow, and Thigh) are then compared. If all three are roughly equal, no change is 
made: we select V CUITCI1 f If x current is lower than both Xi ow and Thigh, we choose 
the V corresponding to the highest x. If they are in either increasing or 
decreasing order, we use the slope of test points (Vi ow , xi ow ) and (Vhigh, Thigh) 
to interpolate a new point. However, to avoid occasional large swings in 
parameter settings, we bound the maximum change from V CU n-ent- The final 
case occurs when T cuiren t is higher than both Xi ow and Thigh- In this case, we 
apply inverse parabolic interpolation to select the apex of the parabola 
formed by the three points, in the hope that this will lead us to the highest x 
in the region. 

Once this procedure has been performed for all of the weights being 
learned, it is possible to postprocess the weight changes, for instance to 
normalize them. However, at present we have not found this to be necessary. 
The chosen values now become the new base vector for the next iteration. 

5.3 Automation 

A substantial amount of code was written to automate the communication of 
work and results between multiple, distributed instantiations of CRAFTY and 
the PostgreSQL database. We implemented placeholder scheduling (Pinchak, 
2002) so that learning could occur more rapidly, and without human 
intervention. 



1 The tuning experiments reported in this paper used 1.01. 
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Traditionally, researchers have used search depth to quantify search effort. 
For our learning algorithm, doing so would not be appropriate: the amount of 
effort required to search to a fixed depth varies wildly between positions, 
and we will be comparing the assessments of these positions. However, 
because we did not have the dedicated use of computational resources, we 
could not use search time either. While it is known that chess engines tend to 
search more nodes per second in the endgame than the middlegame, this 
difference is insignificant for our short searches because it is dwarfed by the 
overhead of preparing the engine to search an arbitrary position. Therefore, 
we chose to quantify search effort by the number of nodes visited. 

We instructed CRAFTY to search 16,384 nodes to assess a position. 
Earlier experiments that directly called the static evaluation or quiescence 
search routines to form assessments were not successful. When searching 
1,024 nodes per position, we had mixed results. Like the DEEP THOUGHT 
team (Nowatzyk, 2000), we found that larger searches improve the quality of 
learning. The downside is, of course, the additional processor time required 
by the learning process. 

There are positions in our data set from which CRAFTY does not complete 
a 1-ply search within 16,384 nodes, because its quiescence search explores 
many sequences of captures. When this occurs, no evaluation score is 
available to use. Instead of using either zero or the statically computed 
evaluation (which is not designed to operate without a quiescence search), 
we chose to throw away the data point for that particular computation of x, 
reducing the position count (n). However, the value of x for similar data of 
different population sizes is not necessarily constant. As feature weights are 
changed, the shape of the search tree for positions may also change. This can 
cause Crafty to not finish a 1-ply search for a position within the node 
limit where it was previously able to do so, or vice versa. When many 
transitions in the same direction occur simultaneously, noticeable 
irregularities are introduced into the learning process. Ignoring the node 
count limitation until the first ply of search has been completed may be a 
better strategy. 

5.5 Performance 

Early experiments were performed using idle time on various machines in 
our department. Lately, we have had (non-exclusive) access to clusters of 
personal computer workstations, which is helpful because the task of 
computing x for distinct weight vectors within an iteration is trivially 
parallel. Examining 32,768 positions and computing x takes about two 
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minutes per weight vector. The cost of computing x is negligible in 
comparison, so in the best case, when there are enough nodes available for 
the concordances of all weight vectors of an iteration to be computed 
simultaneously, learning proceeds at the rate of 30 iterations per hour. 



6. Experimental Results 

We demonstrate that concordance between human judgments and machine 
assessments increases with increasing depth of machine search. This result, 
combined with knowing that play improves as search depth increases 
(Thompson, 1982), in turn justifies our attempt to use this concordance as a 
metric to tune selected feature weights of CRAFTY’s static evaluation 
function. 

6.1 Concordance as Machine Search Effort Increases 



In Table 4 we computed x for depths 1 through 7 for n = 649,698 positions, 
performing work equivalent to 211 billion (10 9 ) comparisons at each depth. 
S + and S” are reported in billions. As search depth increases, the difference 
between S + and S', and therefore x, also increases. The sum of S + and S' is 



not constant because at different depths different amounts of extra y-pairs 
and duplicate pairs are encountered. 



depth 


S + / 10 9 


S'/10 9 


T 


1 


110.374 


65.298 


0.2136 


2 


127.113 


48.934 


0.3705 


3 


131.384 


45.002 


0.4093 


4 


141.496 


36.505 


0.4975 


5 


144.168 


34.726 


0.5186 


6 


149.517 


30.136 


0.5656 


7 


150.977 


29.566 


0.5753 



It is difficult to predict how close 
an agreement might be reached 
using deeper searches. Two effects 
come into play: diminishing returns 
from additional search, and dimin- 
ishing accuracy of human assess- 
ments relative to ever more deeply 
searched machine assessments. 



Table 4. x computed for various search Particularly interesting is the odd- 

depths, n = 649,698. even e ff ec t on the change in x as 

depth increases. It has long been 
known that searching to the next depth of an alpha-beta search requires 
relatively much more effort when that next depth is even than when it is odd 
(Marsland, 1983). Notably, x tends to increase more in precisely these cases. 

Similar experiments performed using increasing node counts, and 
increasing wall clock time (on a dedicated machine) with a different, smaller 
data set also gave increasing concordance, but, as expected, did not exhibit 
the staggered rise of the increasing depth searches. In sum, these 
experiments lend credibility to our belief that x is a direct measure of 
decision quality. 
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6.2 Uming of Crafty’s Feature Weights 



Crafty uses centipawns (hundredths of a pawn) as its evaluation function 
resolution, so experiments were performed by playing CRAFTY as distributed 
versus Crafty with the learned weights rounded to the nearest centipawn. 
Each program played each position both as White and as Black. The feature 
weights we tuned are given along with their default values in Table 5. 



The scaling factors 
were chosen because 
they act as control knobs 
for many subterms. 
Bishop and knight were 
included because they 
participate in the most 
common piece imbal- 
ances. Trading a bishop 
for a knight is common, 
so it is important to 
include both to show 
that one is not learning 
to be of a certain weight chiefly because of the weight of the other. We also 
included three of the most important positional terms involving rooks. 
Material values for the rook and queen are not included because trials 
showed that they climbed even more quickly than the bishop and knight do, 
yielding no new insights. 



feature 


default value 


king safety scaling factor 


100 


king safety asymmetry scaling factor 


-40 


king safety tropism scaling factor 


100 


blocked pawn scaling factor 


100 


passed pawn scaling factor 


100 


pawn structure scaling factor 


100 


bishop 


300 


knight 


300 


rook on the seventh rank 


30 


rook on an open file 


24 


rook behind a passed pawn 


40 



Table 5. Tuned features, with Crafty’s default values. 



6.2.1 Timing from Arbitrary Values 

Figure 1 illustrates the learning. The 11 parameters were all initialized to 50, 
where 100 represents both the value of a pawn and the default value of most 
scaling factors. For ease of interpretation, legend contents are ordered to 
match up with the vertical ordering of corresponding data at the rightmost 
point on the x-axis. For instance, bishop is the topmost value, followed by 
knight, then x, and so on. x is measured on the left y-axis in linear scale; 
weights are measured on the right y-axis in logarithmic scale, for improved 
visibility of the weight trajectories. 

Rapid improvement is made as the bishop and knight weights climb 
swiftly to about 285, after which x continues to climb, albeit more slowly. 
We attribute most of the improvement in x to the proper determination of 
weight values for the minor pieces. All the material and positional weights 
are tuned to reasonable values. 




concordance (t) 
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bishop (50 -> 294) 

knight (50 -> 287) 

tau (0.2692 -> 0.3909) 

king tropism s.f. (50 -> 135) 

pawn structure s.f. (50 -> 106) 

blocked pawns s.f. (50 -> 76) 



passed pawn s.f. (50 -> 52) 
king safety s.f. (50 -> 52) 
rook on open file (50 -> 42) 
rook on 7th rank (50 -> 35) 
rook behind passed pawn (50 -> 34) 
king safety asymmetry s.f. (50 -> 8) 



Figure 1. Change in weights from 50 as t is maximized. 

The scaling factors learned are more interesting. The king tropism and 
pawn structure scaling factors gradually reached, then exceeded Crafty’s 
default values of 100. The scaling factors for blocked pawns, passed pawns, 
and king safety are lower, but not unreasonably so. However, the king safety 
asymmetry scaling factor dives quickly and relentlessly. Crafty’s default 
value for this term is -40; perhaps we should have started it at a lower value 
to speed convergence. 

Tables 6 and 7 contain match results of the weight vectors at specified 
iterations during the learning illustrated in Figure 1. Each side plays each 
starting position both as White and as Black, so with the Nunn 30 test, 60 
games are played, and with the Dailey 200 test, 400 games, are played. 
Games reaching move 121 were declared drawn. 

The play of the tuned program improves dramatically as learning occurs. 
Of interest is the apparent gradual decline in percentage score for later 
iterations on the Nunn 30 test suite. The DEEP Thought team (Nowatzyk, 
2000) found that their best parameter settings were achieved before reaching 
maximum agreement with GM players. Perhaps we are also experiencing 
this phenomenon. We used the Dailey 200 test suite to attempt to confirm 



value (pawn = 100; default scaling factor = 100) 
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that this was a real effect, and found that by this measure too, the weight 
vectors at iterations 300 and 400 were superior to later ones. 



Throughout our ex- 
perimentation, we have 
found that our tuned 
feature weights tend to 
perform better on the 
Nunn test suite than the 
Dailey test suite. Nunn’s 
suite contains positions 
of particular strategic 
and tactical complexity. 
Dailey’s suite is largely 

more staid, and contains positions from much earlier in the game. Crafty’s 
default weights appear to be more comfortable with the latter than the 



iteration 


wins 


draws 


losses 


percentage score 


0 


3 


1 


56 


5.83 


100 


3 


9 


48 


12.50 


200 


14 


21 


25 


40.83 


300 


21 


26 


13 


56.67 


400 


19 


28 


13 


55.00 


500 


18 


26 


16 


51.67 


600 


18 


23 


19 


49.17 



Table 6. Match results (11 weights tuned from 50 vs. 
default weights), 5 minutes per game, Nunn 30 test suite. 



former. 



iteration 


wins 


draws 


losses 


percentage score 


0 


3 


13 


384 


2.38 


100 


12 


31 


357 


6.88 


200 


76 


128 


196 


35.00 


300 


128 


152 


120 


51.00 


400 


129 


143 


128 


50.13 


500 


107 


143 


150 


44.63 


600 


119 


158 


123 


49.50 



We conclude that the 
learning is able to yield 
settings that perform 
comparably to settings 
tuned by hand over years 
of games versus grand- 
masters. 



Table 7. Match results (11 weights tuned from 50 vs. 
default weights), 5 minutes per game, Dailey 200 test suite. 



6.2.2 Timing from CRAFTY’s Default Values 

We repeated the just-discussed experiment with one change: the feature 
weights start at CRAFTY’s default values rather than at 50. Figure 2 depicts 
the learning. Note that we have negated the values of the king safety 
asymmetry scaling factor in the graph so that we could retain the logarithmic 
scale on the right y-axis, and also for another reason, for which see below. 

While most values remain normal, the king safety scaling factor 
surprisingly rises to almost four times the default value. Meanwhile, the king 
safety asymmetry scaling factor descends even below -100. The 
combination indicates a complete lack of regard for the opponent’s king 
safety, but great regard for its own. Table 8 shows that this conservative 
strategy is by no means an improvement. 
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king safety s.f. (100 -> 362) 

tau (0.4186 ->0.5130) 

bishop (300 -> 279) 

knight (300 -> 274) 

0 - king safety asym. s.f. (-40 -> -132) 

king tropism s.f. (100 -> 1 19) 



blocked pawns s.f. (100 -> 111) 

pawn structure s.f. (100 -> 93) 

passed pawn s.f. (100 -> 88) 

rook behind passed pawn (40 -> 36) 

rook on 7th rank (30 -> 33) 

rook on open file (24 -> 26) 



Figure 2. Change in weights from Crafty’s defaults as t is maximized. 



iteration 


wins 


draws 


losses 


percentage score 


25 


19 


23 


18 


50.83 


50 


16 


31 


13 


52.50 


75 


11 


32 


17 


45.00 


100 


14 


28 


18 


46.67 


125 


9 


23 


28 


34.17 


150 


8 


35 


17 


42.50 



Table 8. Match results (11 weights tuned from defaults vs. 
default weights), 5 minutes per game, Nunn 30 test suite. 



The most unusual 
behaviour of the king 
safety and king safety 
asymmetry scaling fac- 
tors deserves specific 
attention. When the 
other nine terms are 
left constant, these two 
terms behave similarly 
to how they do when 



all eleven terms are tuned. In contrast, when these two terms are held 
constant, no statistically significant performance difference is found between 
the learned weights and Crafty’s default weights. When the values of the 
king safety asymmetry scaling factor are negated as in Figure 2, it becomes 
visually clear from their trajectories that the two terms are behaving in a 
codependent manner. More investigation is required to determine the root 
cause of this behaviour. 



value (pawn = 100; default scaling factor = 100) 
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7. Conclusion 

We have proposed a new procedure for optimizing static evaluation 
functions based upon globally ordering a multiplicity of positions in a 
consistent manner. This application of ordinal correlation is fundamentally 
different from prior evaluation function tuning techniques. We believe it is 
worth further exploration, and hope it will lead to a new perspective and 
fresh insights about decision making in game-tree search. 

While our initial results show promise, more work is certainly needed. It 
is important to keep in mind that we tuned feature weights in accordance 
with human assessments. Doing so may simply not be optimal for computer 
play. Nonetheless, it is worth noting that having reduced the playing ability 
of a grandmaster-level program to candidate master strength by significantly 
altering several important feature weights, the learning algorithm was able to 
restore the program to grandmaster strength. 

7.1 Reflection 

Having identified the anomalous behaviour in Figure 2, it is worth looking 
again at Figure 1. The match results suggest that all productive learning 
occurred by iteration 400 at the latest, after which a small but perceptible 
decline appears to occur. The undesirable codependency between the king 
safety and king safety asymmetry scaling factors also appears to be present 
in the later iterations of the first experiment. 

Furthermore, our training data is small enough (n = 32,768) that 
overfitting is a consideration. Future learning experiments should use more 
positions. This may in turn reduce the search effort required per position to 
tune weights well. Although we are not certain why larger searches improve 
the quality of learning, as the amount of search used per machine assessment 
increases, the amount of information gathered about how relative weights 
interact also increases. On the surface, then, the improvement is not 
illogical. 

While some weights, for instance the positional rook terms, learned 
nearly identical values in both experiments, other features exhibited more 
variance. For cases such as the king tropism and blocked pawns scaling 
factors, it could be that comparable performance may be achieved with a 
relatively wide range of values. 

In our reported experiments, computation of x was dominated by the 
search effort to generate machine assessments, enough so that the use of 
Spearman’s p (or perhaps even Pearson correlation, notwithstanding our 
original rationale) may also have been possible. Maximizing these 
alternative metrics could be tried, at least when the training data contains 
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relatively few positions. Other optimization strategies, for instance genetic 
algorithms, could also be tried. 

It was not originally planned to attempt to maximize x only upon 
assessments at a specific level of search effort. Unfortunately, we 
encountered implementation difficulties, and so reverted to the approach 
described herein. We had intended to log the node number or time point 
along with the new score whenever the evaluation of a position changes. 
This would have, without the use of excessive storage, provided the precise 
score at any point throughout the search. We would have tuned to maximize 
the integral of x over the period of search effort. Implementation of this 
algorithm would more explicitly reward reaching better evaluations more 
quickly, improving the likelihood of tuning feature weights and perhaps even 
search control parameters effectively. 

7.2 Future Directions 

While our experiments used chess assessments from humans, it is possible to 
use assessments from deeper searches and/or from a stronger engine, or to 
tune a static evaluation function for a different domain. Depending on the 
circumstances, merging consecutively-ordered fine-grained assessments into 
fewer, larger categories may be desirable. Doing so could even become 
necessary should the computation of x do min ate the time per iteration, but 
this is unlikely unless one uses only negligible search to form machine 
assessments. 

Elidan et al. (2002) found that perturbation of training data could assist in 
escaping local maxima during learning. Our implementation of x, designed 
with this finding in mind, allows non-integer weights to be assigned to each 
cell. Perturbing the weights in an adversarial manner as local maxima are 
reached, so that positions are weighted slightly more important when 
generally discordant, and slightly less important when generally concordant, 
could allow the learner to continue making progress. 

It would also be worthwhile to examine positions of maximum 
disagreement between human and machine assessments, in the hope that 
study of the resulting positions will identify new features that are not 
currently present in Crafty’s evaluation. Via this process, a number of 
labeling errors would be identified and corrected. However, we do not 
believe that this would materially affect the outcome of the learning process. 

A popular pastime amongst computer chess hobbyists is to attempt to 
discover feature weight settings that result in play mimicking their favourite 
human players. By tuning against appropriate training data, e.g., from 
opening monographs and analyses published in Chess Informant and 
elsewhere that are authored by the player to be mimicked, training an 
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evaluation function to assess positions similarly to how a particular player 
might actually do so should now be possible. 

Producers of top computer chess software play many games against their 
commercial competitors. They could use our method to model their 
opponent’s evaluation function, then use this model in a minimax (no longer 
negamax) search. Matches then played would be more likely to reach 
positions where the two evaluation functions differ most, providing 
improved winning chances for the program whose evaluation function is 
more accurate, and object lessons for the subsequent improvement of the 
other. 

Identifying the most realistic mapping of Crafty’s machine assessments 
to the seven human positional assessments is also of interest. This 
information would allow CRAFTY (or a graphical user interface connected to 
Crafty) to present scoring information in a human-friendly format 
alongside the machine score. 
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Abstract ProbCut is a selective-search enhancement to the standard alpha-beta algorithm 
for two-person games. ProbCut and its improved variant Multi-ProbCut (MPC) 
have been shown to be effective in Othello and Shogi, but there had not been 
any report of success in the game of chess previously. This paper discusses our 
implementation of ProbCut and MPC in the chess engine Crafty. Initial test 
results suggest that the MPC version of Crafty is stronger than the original 
version of Crafty: it searches deeper in promising lines and defeated the 
original Crafty +22—10 = 32 (59.4%) in a 64-game match. Incorporating 
MPC into Crafty also increased its tournament performance against Yace - 
another strong chess program: Crafty’s speed chess tournament score went 
up from 51% to 56%. 

Keywords: Selective search, ProbCut, chess 

1. Introduction 

Computer chess has been an AI research topic since the invention of the 
computer, and it has come a long way. Nowadays, the best computer chess pro- 
grams and the best human grandmasters play at roughly the same level. Most of 
the successful chess programs use the so-called brute-force approach, in which 
the program has limited chess knowledge and relies on a fast search algorithm 
to find the best move. There has been much research on improving the original 
minimax algorithm for finding moves in two player perfect information games. 
Enhancements range from sound backward pruning (alpha-beta search), over 
using transposition tables and iterative deepening, to selective search heuristics 
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that either extend interesting lines of play or prune uninteresting parts of the 
search tree. 

The ProbCut (Buro, 1995) and Multi-ProbCut (MPC) (Buro, 1997a) heuris- 
tics fall into the last category. They were first implemented in Othello programs 
where they resulted in a much better performance compared to full-width alpha- 
beta search. Utilizing MPC, Logistello defeated the reigning human Othello 
World Champion Takeshi Murakami by a score of 6-0 in 1997 (Buro, 1997b). 

ProbCut and MPC do not rely on any game specific properties. However, 
there were no previous reports of success at implementing them in the game 
of chess. In this paper we present our first implementations of ProbCut and 
MPC in a chess program and some experimental results on their performance. 
Section 2 gives some necessary background knowledge. Section 3 discusses 
our ProbCut implementation and Section 4 discusses our MPC implementation. 
Finally, Section 5 concludes and discusses some ideas for future research. 

2. Background 

There has been a lot of previous research in the field of game-tree search. 
We will not attempt to cover it all here. Instead, we will concentrate on things 
relevant to ProbCut. For an introduction to game-tree search, a good web-site 

is www.xs4all.nl/~verhelst/chess/search.html . 

2.1 Minimax and Alpha-Beta Search 

For two-person zero-sum games like chess, positions can be viewed as nodes 
in a tree or DAG. In this model, moves are represented by edges which connect 
nodes. Finding the best move in a given positions then means to search through 
the successors of the position in order to find the best successor for the player 
to move after finding the best successor for the opponent in the next level of the 
tree. This procedure is called minimaxing. In practice, computers do not have 
time to search to the end of the game. Instead, they search to a certain depth, 
and use a heuristic evaluation function to evaluate the leaf nodes statically. For 
chess, the evaluation function is based on material and other considerations 
such as king safety, mobility, and pawn structure. 

An important improvement over minimax search is alpha-beta pruning (Knuth 
and Moore, 1975). An alpha-beta search procedure takes additional parameters 
alpha and beta, and returns the correct minimax value (up to a certain depth) 
if the value is inside the window (alpha, beta). A returned value greater or 
equal to beta is a lower bound on the the minimax value, and a value less or 
equal to alpha is an upper bound. These cases are called fail-high aadfail-low, 
respectively. A pseudo-code representation of one version of the algorithm 
is shown in Figure 1. The algorithm shown is called “fail-hard” alpha-beta, 
because it generally returns alpha for fail-lows and beta for fail-highs. There 




First Experimental Results of ProbCut Applied to Chess 



21 



int AlphaBeta(int alpha, int beta, int height) { 
if (height == 0) return Evaluation () ; 

int total_moves = GenerateMovesO ; 
for (int i=0; i < total_moves; i++) { 

MakeMove(i); 

val = -AlphaBeta(-beta, -alpha, height-1) ; 

UndoMove(i) ; 

if (val >= beta) return val; 

if (val > alpha) alpha = val; 

> 

return alpha; 

> 

Figure 1. The alpha-beta algorithm (fail-hard version). 

exist “fail-soft” versions of alpha-beta which can return values outside of the 
alpha-beta window, thus giving better bounds when it fail-high/fail-low. 

There have been a number of enhancements to alpha-beta, e.g. transposi- 
tion tables, iterative deepening, NegaScout, etc. (Reinefeld, 1983; Junghanns, 
1998). Armed with these refinements, alpha-beta has become the dominant 
algorithm for game tree searching (Junghanns, 1998). 

Compared to minimax, alpha-beta is able to prune many subtrees that would 
not influence the minimax value of the root position. But it still spends most of its 
time calculating irrelevant branches that human experts would never consider. 
Researchers have been trying to make the search more selective, while not 
overlooking important branches. How should we decide whether to search a 
particular branch or not? One idea is to base this decision on the result of a 
shallower search. The null-move heuristic (Beal, 1990; Donninger, 1993) and 
ProbCut are two approaches based on this idea. 

2.2 The Null-Move Heuristic 

A null-move is equivalent to a pass: the player does nothing and lets the 
opponent move. Passing is not allowed in chess, but in chess games it is almost 
always better to play a move than passing. The null-move heuristic (or null- 
move pruning) takes advantage of this fact, and before searching the regular 
moves for height— 1 plies as in alpha-beta, it does a shallower search on the 
null-move for height —JR - 1 plies, where R is usually 2. If the search on the 
null-move returns a value greater or equal to beta, then it is very likely that one 
of the regular moves will also fail-high. In this case we simply return beta after 
the search on the null-move. This procedure can even be applied recursively in 
the shallower search, as long as no two null-moves are played consecutively. 

Because the search on the null-move is shallower than the rest, occasionally 
it will overlook something and mistakenly cut the branch, but the speed-up from 
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cutting these branches allows it to search deeper on more relevant branches. The 
benefits far outweigh the occasional mistakes. However, in chess endgames 
with few pieces left, zugzwang positions are often encountered, in which any 
move will deteriorate the position. Null-move heuristic fails badly in zugzwang 
positions. As a result, chess programs turn off null-move heuristic in late 
endgames. 

There have been some research to further fine-tune and improve the null- 
move heuristic. Adaptive Null-Move Pruning (Heinz, 1999) uses R = 3 for 
positions near the root of the tree and R = 2 for positions near the leaves of 
the tree, as a compromise between the too aggressive R = 3 and the robust 
but slower R — 2. Verified Null-Move Pruning (Tabibi and Netanyahu, 2002) 
uses R = 3, but whenever the shallow null-move search returns a fail-high, 
instead of cutting, the search is continued with reduced depth. Verified null- 
move pruning can detect zugzwang positions, have better tactical strength while 
searching less nodes than standard R = 2. 

The null-move heuristic is very effective in chess, and most of the strong 
chess engines use it. But it depends on the property that the right to move has 
positive value, so it is not useful to games like Othello and checkers, in which 
zugzwang positions are common. 

2.3 ProbCut 

ProbCut is based on the idea that the result v' of a shallow search is a rough 
estimate of the result v of a deeper search. The simplest way to model this 
relationship is by means of a linear model: 

v = a ■ v' + b + e, 

where e is a normally distributed error variable with mean 0 and standard de- 
viation a. The parameters a , b, and a can be computed by linear regression 
applied to the search results of thousands of positions. 

If based on the value of v', we are certain that v > /3, where /? is the 
beta-bound for the search on the current subtree, we can prune the subtree and 
return [J. After some algebraic manipulations, the above condition becomes 
(av' + b — (3)/a > —eja. This means that v > /3 holds true with probability 
of at least p iff (av' + b — j3)/ a > # -1 (p). Here, $ is the standard Normal 
distribution. This inequality is equivalent to v' > ($ 1 (p) ■ a + j3 - b)/a. 
Similarly for v < a, the condition becomes v' < (— <3> -1 (p) ■ a + a — b) / a. 
This leads to the pseudo-code implementation shown on Figure 2. Note that 
the search windows for the shallow searches are set to have width 1. These are 
called null-window searches. Generally, the narrower the window is, the earlier 
the search returns. Null-window searches are very efficient when we do not 
care about the exact minimax value and only want to know whether the value 
is above or below a certain bound, which is the case here. The depth pair and 
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cut threshold are to be determined empirically, by checking the performance of 
the program with various parameter settings. 

For ProbCut to be successful, v r needs to be a good estimator of v, with a fairly 
small a. This means that the evaluation function needs to be a fairly accurate 
estimator of the search results. Evaluation functions for chess are generally 
not very accurate, due to opportunities of capturing which cannot be resolved 
statically. Fortunately, most chess programs conduct a so-called quiescence 
search: at the leaves of the game tree where the regular search height reaches 
zero, instead of calling the evaluation function, a special quiescence search 
function is called to search only capturing moves, only using the evaluation 
function’s results when there are no profitable capturing moves. Quiescence 
search returns a much more accurate value. 

In summary, the null-move heuristic and ProbCut both try to compensate 
for the lower accuracy of the shallow search by making it harder for the shal- 
low search to produce a cut. The null-move heuristic does this by giving the 
opponent a free move, while ProbCut widens the alpha-beta window. 



#define S 4 // depth of shallow search 

#define H 8 // check height 

#define T 1.0 // cut threshold 

int AlphaBeta(int alpha, int beta, int height) { 
if (height == 0) return EvaluationO ; 

if (height == H) { 
int bound; 

// is v >= beta likely? 

bound = round ((T * sigma + beta - b) / a) ; 
if (AlphaBeta (bound- 1 , bound, S) >= bound) 
return beta; 

// is v <= alpha likely? 

bound = round ((-T * sigma + alpha - b) /a); 
if (AlphaBeta (bound, bound+1, S) <= bound) 
return alpha; 

> 

// The rest of alpha-beta code goes here 



> 



Figure 2. ProbCut implementation with depth pair (4,8) and cut threshold 1 .0. 
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2.4 Multi-ProbCut 

MPC enhances ProbCut in several ways: 

■ Allowing different regression parameters and cut thresholds for different 
stages of the game. 

■ Using more than one depth pair. For example, when using depth pairs 
(3,5) and (4,8), if at check height 8 the 4-ply shallow search does not 
produce a cut, then further down the 8-ply subtree we could still cut some 
5-ply subtrees using 3-ply searches. 

■ Internal iterative deepening for shallow searches. 

Figure 3 shows pseudo-code for a generic implementation of MPC. The 
MPC search function is not recursive in the sense that ProbCut is not applied 
inside the shallow searches. This is done to avoid the collapsing of search depth. 
In the case of Othello, MPC shows significant improvements over ProbCut. 

2.5 ProbCut and Chess 

There has been no report of success for ProbCut or MPC in chess thus far. 
There are at least two reasons for this: 

1 The null-move heuristic has been successfully applied to chess. Null- 
move and ProbCut are based on similar ideas. As a result they tend to 
prune the same type of positions. Part of the reason why ProbCut is 
so successful in Othello is that the null-move heuristic does not work 
in Othello because it is a zugzwang game. But in chess, ProbCut and 
MPC have to compete with null-moves, which already improves upon 
brute-force alpha-beta search. 

2 The probability of a chess search making a serious error is relatively high, 
probably due to the higher branching factor (Junghannsetal., 1997). This 
leads to a relatively large standard deviation in the linear relationship be- 
tween shallow and deep search results, which makes it harder for ProbCut 
to prune sub-trees. 

In the GAMES group at the University of Alberta there had been attempts 
to make ProbCut work in chess in 1997 (Junghanns and Brockington, 2002). 
However, the cut-thresholds were chosen too conservatively resulting in a weak 
performance. 

Recently, researchers in Japan have successfully applied ProbCut to Shogi 
(Shibahara, Inui, and Kotani, 2002). In Shogi programs forward pruning meth- 
ods are not widely used, because Shogi endgames are much more volatile than 
chess endings. Therefore, ProbCut by itself can easily improve search perfor- 
mance compared with plain alpha-beta searchers. As mentioned above, gaining 
improvements in chess, however, is much harder because of the already very 
good performance of the null-move heuristic. 
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#define MAX_STAGE 2 // e.g. middle-game , endgame 

#def ine MAX_HEIGHT 10 // max. check height 

#define NUM_TRY 2 // max. number of checks 

// ProbCut parameter sets for each stage and height 

struct Param { 

int d; // shallow depth 

float t; // cut threshold 

float a, b, s; // slope, offset, std.dev. 

> param [MAX_STAGE+1] [MAX_HEIGHT+1] [NUM_TRY] ; 

int MPC(int alpha, int beta, int height) { 

// ProbCut check 
if (height <= MAX__HEIGHT) { 

for (int i=0; i < NUM_TRY; i++) { 
int bound; 

Param &pa = param [stage] [height] [i] ; 

// skip if there are no parameters availabe 
if (pa.d < 0) break; 

// is vjieight >= beta likely? 
bound = round((pa.t*pa.s+beta-pa.b)/pa.a) ; 
if (AlphaBeta(bound-l, bound, pa.d) >= bound) 
return beta; 

// is vjieight <= alpha likely? 
bound = round ( (-pa. t*pa. s+alpha-pa.b)/pa. a) ; 
if (AlphaBeta(bound, bound+1, pa.d) <= bound) 
return alpha; 

> 

> 

// the remainder of the alpha-beta algorithm 



} 

Figure 3. Multi-ProbCut implementation. AlphaBeta() is the original alpha-beta search 
function. 



3. ProbCut Implementation 

Before trying MPC, we implemented the simpler ProbCut heuristic with 
one depth pair and incorporated it into Crafty (version 18.15) by Hyatt. 1 



1 Crafty’s source code is available at ftp://ftp.cis.uab.edu/pub/hyatt. 
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Crafty is a state-of-the-art free chess engine. It uses a typical brute-force 
approach, with a fast evaluation function, NegaScout search and all the stan- 
dard search and all the standard enhancements: transposition table, Null-Move 
heuristic, etc. Crafty also utilizes quiescence search, so the results of its 
evaluation function plus quiescence search are fairly accurate. 

The philosophy of our approach is to take advantage of the speed-up provided 
by the null-move heuristic whenever possible. One obvious way to combine 
the null-move and ProbCut heuristics is to view null-move search as part of 
the brute-force search, and build ProbCut on top of the “alpha-beta plus null- 
move” search. Applying the necessary changes to Crafty is easy. We put the 
ProbCut shallow search code in front of the null-move shallow search code. 
We also implemented the MPC feature that allows different parameters to be 
used for middle-game and endgame. 

Before ProbCut-CRAFTY could be tested, parameters of the linear Prob- 
Cut opinion change model had to be estimated. We let Crafty search (us- 
ing alpha-beta with null-move heuristic) around 2700 positions and record 
its search results for 1, 2, . . . , 10 plies. The positions were chosen randomly 
from some computer chess tournament games and some of Crafty’s games 
against human grandmasters on internet chess servers. Note that that Crafty 
was using the null-move heuristic for these searches. 

Then we fitted the linear regression model for several depth pairs and game 
phases, using the data collected. The results indicate that shallow and deep 
search results are correlated, as shown in Figure 4. However, the fit is not 
perfect. The v' versus v relation has the following characteristics. 

■ The slope is closer to 1.0 and the standard deviation smaller for v' data 
points closer to zero. For example, for depth pair (4, 8), and v' data points 
in the range [—300, 300], the slope is 1.07 and the standard deviation is 
83; for v' data points in the range [— 1000, 1000] , the slope is 1 . 13 and the 
standard deviation is 103. This can be explained as follows: if say White 
has a big advantage, then White will likely gain more material advantage 
after a few more moves. Therefore, if the shallow search returns a big 
advantage, a deeper search will likely return a bigger advantage, and 
vice versa for disadvantages. We only used v' data points in the range 
[—300, 300] for the linear regression. 

■ Occasionally the shallow search misses a check-mate while the deeper 
search finds it. For example, in a position White can check-mate in 7 
plies. A 4-ply search cannot find the check-mate while a 8-ply search 
can find it. For the depth pair (4, 8), and v' data points in the range 
[—300, 300], this happens roughly once every 1000 positions. A check- 
mate-in-iV-moves is represented by a large integer in Crafty. We 
excluded these data points from the linear regression, because the evalu- 
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Figure 4. v' versus v for depth pair (4,8) The evaluation function’s scale is 100 = one pawn, 
i.e. a score of 100 means the player to move is one pawn up (or has equivalent positional 
advantage). 
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Table 1. Linear regression results. The evaluation function’s scale is 100 = one pawn, r is the 
regression correlation coefficient, a measure of how good the data fits the linear model. 

ation of check-mate is a rather arbitrary large number, there is no proper 
way to incorporate these data points in the linear regression. 

We also fitted model parameters for different game stages. It turned out that 
the standard deviation for the fit using only endgame positions 2 is smaller than 
the standard deviation using only middle-game positions. Table 1 shows some 
of the results. 

We conducted some experiments 3 with different depth pairs and cut thres- 
holds. Depth pairs (4, 6) and (4, 8), and cut thresholds 1.0 and 1.5 were tried. 



2 In Crafty endgame positions are defined as those in which both players have weighted material count 
less than 15. Here Queen is 9, Rook is 5, Knight/Bishop is 3, and Pawns do not count. 

3 All initial experiments were run on Pentium-3/850MHz and Athlon-MP/1.66GHz machines under Linux, 
whereas the later tournaments were all played on Athlon-MP/2GHz machines. Crafty’s hash table size 
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We used two types of tests. First, we test the search speed by running fixed- 
time searches and look at the depths reached. If a ProbCut version is not faster 
than the plain null-move version, then the ProbCut version is clearly no good. 
If a ProbCut version is faster than null-move, it is still not necessarily better. 
So to test the overall performance, we then run matches between the promising 
ProbCut versions and the original Crafty. 

We let the program search about 300 real-game positions, spending 30 sec- 
onds on each position, and see how deep it was able to search on average. 
Results show that 

■ Versions with depth pairs (4,6) and (4,8) have similar speeds. 

■ The versions with cut threshold 1.5 are not faster than plain Crafty. 

■ The versions with cut threshold 1.0 are slightly faster than Crafty: 
they search 11.6 plies compared to 11.5 plies by CrAfty. In some 
positions, 80 — 90% of the shallow searches result in cuts, and ProbCut is 
much faster than plain Crafty. But in some other positions the shallow 
searches produce cuts less than 60% of the time, and ProbCut is about the 
same speed or even slower than Crafty. On average, this version of 
ProbCut produces more cuts than plain Crafty’s null-move heuristic 
does at the check height. 

Because the cut threshold 1.5 is no good, we concentrated on the threshold 1.0 
for the following experiments. We ran matches between the ProbCut versions 
and plain Crafty. Each side has 10 minutes per game. A generic opening 
book was used. Endgame databases were not used. A conservative statistical 
test 4 shows that in a 64-game match, a score above 38 points (or 59%) is 
statistically significant with p < 0.05. Here a win counts one point and a draw 
counts half a point. 

The match results are not statistically significant. The ProbCut versions 
seem to be no better nor worse than plain Crafty. For comparison, we ran 
a 64-game match of ProbCut against Crafty with null-move turned off for 
both programs. The ProbCut version is significantly better than Crafty here, 
winning the match 40-24. 

4. MuIti-ProbCut Implementation and Results 

ProbCut produces more cuts than the plain null-move heuristic does, but 
it seems that the small speed-up provided by ProbCut is not enough to result 



was set to 48 MBytes, and the pawn hash table size to 6 MBytes. Opening books and thinking on opponent’s 
time was turned off. 

4 The statistical test is based on the assumption that at least 30% of chess games between these pro- 
grams are draws, which is a fair estimate. The test is based on Amir Ban’s program from his posting 
on rec.game.chess. computer: 

http://groups. google.com/groups ?hl=en&lr=&ie=UTF-8&selm=3307 1608.796A%40msys.co.il 
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Table 2. Endgame threshold optimization results. Reported are the point percentages for MPC- 
Crafty playing 64-game tournaments against Crafty using different values for the endgame 
cut thresholds. Game timing was 2 minutes per player per game plus 12 seconds increment on 
an Athlon-MP 1.67 GHz. The middle-game threshold was fixed at 1.0. 
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1.0 


1.1 


1.2 


1.3 


MPC % 


54.7 


59.4 


57.8 


58.6 


59.4 


53.1 



Table 3. Middle-game threshold optimization results. With the endgame threshold fixed at 
1.0 we repeated the 64-game tournaments now using faster hardware (Athlon-MP 2 GHz) that 
just became available and longer time controls: 10 minutes per player per game plus 60 seconds 
increment. Each tournament took about eight CPU days. 



in better playing strength. This motivates our implementation of MPC. We 
already have different regression parameters for middle-game and endgame in 
our ProbCut implementation. Now we implemented multiple depth pairs. The 
implementation was straightforward, much like the pseudo-code in Figure 3. 

After initial experiments which showed that the null-move heuristic excels 
at small heights, we chose depth pairs (2,6), (3,7), (4,8), (3,9), and (4,10) for 
endgames and middle-games. Another reason for choosing pairs with increas- 
ing depth differences is that otherwise the advantage of MPC rapidly diminishes 
in longer timed games. We tested the speed of the MPC implementation using 
a cut threshold of 1.0 on the same 300+ positions as in Section 1.3. With 30 
seconds per position, it is able to search 12.0 plies on average, which is 0.5 plies 
deeper than original Crafty. 

For optimizing the endgame and middle-game cut thresholds we then ran 
two sets of 64-game tournaments between MPC-Crafty and the original 
version. In the first phase we kept the middle-game cut threshold fixed at 
1.0 and varied the endgame threshold. The results shown in Table 2 roughly 
indicate good threshold choices. However, the high fluctuations suggest that 
we should play more games to get better playing strength estimates. After 
some more experimentation we fixed the endgame threshold at 1.0 and went 
on to optimizing the middle-game cut threshold by playing a second set of 
tournaments, now on faster hardware and longer time controls. Threshold pairs 
(1.2, 1.0) and(1.0, 1.0)resulted in the highest score (59.4%) against the original 
Crafty version. 

In order to validate the self-play optimization results, we let MPC-Crafty 
play a set of tournaments against Yace - a strong chess program written 
by Dieter Buerssner, which is available for Linux and can be downloaded 
from http://homel.stofanet.dk/moq/. Table 4 summarizes the promising re- 
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suits which indicate a moderate playing strength increase even against other 
chess programs when using MPC. 

Pairing Crafty % Crafty % 

(2min+10sec/move) (8min+20sec/move) 

Crafty vs. Yace 42.0% 50.8% 

MPC-Crafty (1.2, 1.0) vs. Yace 53.1% 56.3% 

MPC-Crafty (1.0,1 .0) vs. Yace 57.0% 55.5% 

Table 4. Results of 64-game tournaments played by three Crafty versions against Yace 
using two different time controls. 



5. Conclusions and Further Research 

Preliminary results show that MPC can be successfully applied to chess. 
Our MPC implementation shows clear improvement over our ProbCut (plus 
variable parameters for different stages) implementation. This indicates that 
the main source of improvement in MPC is the use of multiple depth pairs. 
Due to the already good performance of the null-move heuristic in chess, the 
improvement provided by MPC in chess is not as huge as in Othello. However 
our implementation, which combines MPC and null-move heuristic, shows 
definite advantage over the plain null-move heuristic in Crafty, as shown by 
the match results. MPC is relatively easy to implement. We encourage chess 
programmers to try MPC in their chess programs. 

More experiments need to be conducted on our MPC implementation to de- 
termine how evaluation function parameters like the king safety weight can 
influence MPC’s performance. To further verify the strength of the MPC im- 
plementation, we plan to run matches with even longer time controls. 

The depth pairs and the cut threshold can be further fine-tuned. One way 
to optimize them is to run matches between versions with different parameters. 
But better results against another version of the same program do not necessarily 
translate into better results against other opponents. An alternative would be 
to measure the accuracy of search algorithms by a method similar to the one 
employedin(Junghannsetal., 1997)), using a deeper search as the “oracle,” and 
looking at the difference between the oracle’s evaluations on the oracle’s best 
move and the move chosen by the search function we are measuring. Maybe 
the combination of the above two methods gives a better indication of chess 
strength. 
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Abstract This article presents the results of an empirical experiment designed to gain 
insight into what is the effect of the minimax algorithm on the evaluation 
function. The experiment’s simulations were performed upon the KRK chess 
endgame. Our results show that dependencies between evaluations of sibling 
nodes in a game tree and an abundance of possibilities to commit blunders 
present in the KRK endgame are not sufficient to explain the success of the 
minimax principle in practical game-playing as was previously believed. The 
article shows that minimax in combination with a noisy evaluation function 
introduces a bias into the backed-up evaluations and argues that this bias is 
what masked the effectiveness of the minimax in previous studies. 

Keywords: minimax principle, KRK chess endgame, evaluation-function quality, bias 



1. Introduction 

Over twenty years ago Beal (1980) set out to analyze whether and why 
values backed up from minimax search are more trustworthy than the 
heuristic values themselves. He constructed a simple mathematical model to 
analyze the minimax algorithm. To his surprise the analysis of the model 
showed that the backed-up values were actually somewhat less trustworthy 
than the heuristic values themselves. He then wrote: “This result is 
disappointing. It was hoped that the analysis would show that the probability 
of error reduced with backing-up.” A couple of years later two articles (Beal, 
1982; Bratko and Gams, 1982) simultaneously conducted further analysis 
into why minimax does yield good results in practical game-playing while 
apparently backed-up values seem less reliable; both articles reached the 
same conclusion. They argued that the true values of sibling nodes in a game 
tree are not independent of one another. This clustering of similar values is a 
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major feature in practical games and it was this phenomenon that Beal’s 
mathematical model did not account for. The problem with the minimax 
paradigm under the assumption of independence of sibling values was also 
confirmed by Nau (1982, 1983), who called this a search-depth pathology in 
game trees. In a simulation Nau (1982) introduced strong dependencies 
between sibling nodes and discovered that this can cause search-depth 
pathology to disappear. 

However, Pearl (1984) partly disagreed with the conclusion reached by 
Beal, Bratko, Gams and Nau, and claimed that while strong dependencies 
between sibling nodes in a game tree can eliminate the pathology, practical 
games like chess do not possess dependencies of sufficient strength. He 
pointed out that few chess positions are so strong that they cannot be spoiled 
abruptly if one really tries hard to do so. He concluded that the success of 
minimax in game-playing programs is “based on the fact that common 
games do not possess a uniform structure but are riddled with early terminal 
positions, colloquially named blunders, pitfalls or traps. Close ancestors of 
such traps carry more reliable evaluations than the rest of the nodes, and 
when more of these ancestors are exposed by the search, the decisions 
become more valid.” Moreover, Sc hr Lifer (1986) and its follow-up (Althofer, 
1989) did some further analysis of pathology in game trees. Especially 
interesting is their observation that to avoid pathology, an evaluation 
function must, among other things, have negligible probability of 
underestimating a position from the perspective of the player to move. 

All of the above studies have two things in common: (a) they accept the 
empirical evidence that the minimax principle works in practical game- 
playing programs and (b) they try to model mathematically the minimax 
algorithm and theoretically deduce what happens when heuristic values 
assigned to leaves are backed-up towards the root of the game tree. To make 
such mathematical analysis feasible the researchers are forced to make 
certain assumptions about the game they model and to make simplifications 
in their model. Thus, the results of these models are always to be viewed 
with the acknowledgement of this assumptions and simplifications in the 
back of one’s mind. In contrast to that, our approach in this article is to take 
(part of) a real game with a real evaluation function and observe empirically 
what is going on when we change the search depth and the quality of the 
evaluation function. We have at our disposal an absolutely correct evaluation 
function which we can corrupt in a controlled way. We also have a minimax 
search engine that is capable of searching to very high depths because of its 
efficient implementation. 

The next section describes our choice of the game, the evaluation 
function and its artificial corruption, as well as the search engine. Section 3 
presents the results for various settings of our simulation parameters and 
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gives our explanations for the observed phenomena. In Section 4 we give our 
conclusions and some ideas for further work. 

2. Experimental Design 

We have decided to centre our simulations on a simple subset of chess: 
the KRK endgame. In this endgame White has a King and a Rook, while 
Black has only a King. The goal for White is to mate the opponent, striving 
to do so in as little moves as possible. There are two possible outcomes of 
this endgame: a win for White or a draw. While the KRK endgame is very 
simple, it still possesses all the interesting attributes: positions are of various 
difficulties (measured in the number of moves to mate), there surely exist 
dependencies between the values of sibling nodes in a game tree, and there is 
a possibility of blunders and early termination for both sides (stalemate or 
losing a Rook for White; premature mate for Black). 

We are interested in the quality of play for White under different 
conditions. Therefore, unless stated otherwise, we always look at things from 
the White player’s perspective. Also, White is our Min player and Black is 
our Max player. 

For the KRK endgame we have at our disposal an absolutely correct 
evaluation function. It tells us how many moves are needed to reach mate in 
the case that both players play optimally and is measured in moves. It is in 
the form of a database that consists of all possible legal positions and their 
evaluations. The database can be obtained from UCI Machine Learning 
Repository (Blake and Merz, 1998). The positions in the database always 
assume it is Black’s turn to move. There are two special cases: value 0 
means Black is mated and value 255 means that Black has a draw (either the 
position is a stalemate or Black can capture the white Rook). 

The database consists of 28,056 positions. There are actually over 
200,000 legal KRK positions, however board symmetries allow for such a 
reduction. Detailed description of the database and board symmetries is 
given in Bain (1992). Our version of the database is implemented as an array 
of 28,056 cells and can be viewed as a sort of transposition table. Apart from 
positions having special evaluations of 0 or 255, there are 25,233 positions 
divided into 16 levels of difficulty. Positions from level 1 require one move 
(2 plies) to mate (assuming optimal play); positions from level 2 require two 
moves to mate, and so on. Positions from the most difficult level require 16 
moves (32 plies) to mate. Different levels have different number of 
positions; for example, there are 4,553 positions of level 14 and only 390 
positions of level 16. Figure 1 shows how many cases (positions) are left 
unsolved if we applied searches of various depths without any knowledge 
apart from the rules of the game. The term ‘unsolved’ in this context means 
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that White has to make a move without knowing at that time the complete 
move-tree that guarantees a mate. The curve starts to fall significantly 
between depths of 14 and 20 plies and after ply 20 it steeply drops towards 
zero. 




Unsolved cases 
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25000 



Figure 1. Number of unsolved cases as a function of search depth. 

For the purpose of our experiments we corrupted the ideal evaluation 
function in a controlled manner. Our method of doing this is as follows. We 
take a position value and add to it a certain amount of Gaussian noise. The 
formula and a plot are as follows: 

Pix) 



The formula gives the probability P(x)dx that given the correct evaluation /j, 
and standard deviation o the new (corrupted) evaluation x will take on a 
value in the range [x, x + dx\, which is a real number. The error of new 
evaluation is ju-x. We do this for all positions in the database, including the 
positions where Black is already mated (special value 0). The corruption is 
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symmetrical, meaning that there is practically equal chance that the new 
evaluation will be optimistic or pessimistic. We allow x to take on a negative 
value - in this way we are able to preserve the symmetry for positions that 
have true values close or equal to 0. 

The level of corruption is controlled by the parameter a , which is in fact 
the standard deviation and which controls how dispersed are the corrupted 
values x around the correct values p (the width of the hill on the plot above). 
The standard deviation is measured in moves. For example, if a equals 0.5, 
this means that approximately two thirds of corrupted evaluations are within 

0. 5 moves around the true evaluation and over 95% of corrupted evaluations 
are within 1.0 move (two standard deviations) around the true evaluation. 

To be able to compare the quality of initial knowledge (evaluation 
function) to the quality of knowledge after backing up the values with the 
minimax algorithm, we have to be able to calculate the standard deviation 
after minimaxing. This is easy, because our search algorithm returns the 
backed-up values from a fixed search depth for every unique KRK position 
in an array exactly the same as our initial database. This array is in fact our 
‘backed-up’ evaluation function. We thus have one such array for every 
search depth from 0 (initial database) to 32 ply (our chosen final search 
depth). After obtaining such an array, we calculate a with the formula: 

a - 

where x t is the backed-up corrupted value and //, is the true value for position 

1. N is the number of positions in the array. This gives us a tool to monitor 
directly how minimax affects the quality of evaluation function. 

Our search engine is the standard fixed-depth minimax search. The only 
built-in knowledge it has is the ability to detect fatal errors for White 
(stalemates and losing a Rook); the ability to detect mates is not given. We 
were able to search to very high search depths of 32 plies and beyond (if 
desired) by exploiting the fact that the KRK endgame only has a 
comparatively small number of unique (under symmetries) positions 
(28,056) which we can all store in a sort of transposition table. We start at 
depth 0 by loading the values from a (corrupted) database, then move on to 
depth 2, perform a 2-ply minimax search and use the results of the previous 
depth as evaluations of the leaves, store results of depth 2 search, move on to 
depth 4 and so on. Such an implementation of the search algorithm allows it 
to have a linear time complexity instead of the usual and very constraining 
exponential time complexity. 
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3. The Results of the Experiments 

Figure 2 shows what happens to the quality of the evaluation function 
when we change the search depth. The x-axis represents the search depth 
measured in plies and the y-axis represents the standard deviation o 
measured in moves. Each curve in the graph represents a different evaluation 
function - they differ in the level of their corruption (the initial o). The 
legend marks these different evaluation functions with the size of their initial 
a. The best way to separate the curves is to look at their initial corruption. 
The last evaluation function with initial a of 20 is off the scale and its 
corruption level never drops. We performed the experiments with several 
evaluation functions having the same initial a, because the introduction of 
noise is a random process. We found out that the main characteristics remain 
the same for all evaluation functions with the same a and have therefore 
plotted just one evaluation function with certain initial a in all the figures. 

It seems that we have to divide the evaluation functions into two groups: 
in the first group we have evaluation functions with a (relatively) low initial 
error of less than 3.0 and in the second group those with a high initial error. 
The first group is a realistic model of ‘real-life’ evaluation functions, while 
the second group contains evaluation functions with (almost) zero 
knowledge. We can observe that evaluation functions from the first group do 
not exhibit the tendency to drop towards 0 (perfect knowledge). They drop 
slightly or remain on the same level of corruption at best; some even 
increase. In contrast, evaluation functions from the second group only 
increase with increased search depth. 

The experiment demonstrates that for the evaluation functions tested 
searching deeper does not improve the quality of the evaluation function for 
playing the KRK endgame, not even for those with a small initial level of 
corruption. The endgame undoubtedly contains dependencies between the 
values of sibling nodes in a game tree. It is also full of possibilities for 
blunders on the part of white player. White can, after all, lose the Rook in at 
most two moves if Black plays normally. This means that the two reasons 
why minimax is believed effective in practice are present and yet the 
pathology is present as well. How can this be explained? 
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Figure 2. Influence of search depth on quality of evaluation function. 



One thing we were interested in was how the backed-up evaluations are 
corrupted. We were curious whether backed-up evaluations are excessively 
optimistic or excessively pessimistic. To this end we calculated the bias of a 
backed-up evaluation function. Bias is defined as: 



1 N 

bias= ^T,(^i~ x i) 



where li, and x, are again the true and backed-up value of position i, 
respectively. If bias is highly negative then the backed-up values are 
generally overly pessimistic and if bias is highly positive then the backed-up 
values are generally overly optimistic. Since the noise introduced into the 
various evaluation functions was symmetrical we expected the bias to be 
close to zero, meaning some backed-up evaluations are too optimistic and 
others too pessimistic. Figure 3 charts how biased various evaluation 
functions are with respect to search depth. All curves start in close proximity 
of zero and then without exception they all exhibit a highly positive bias. 
The higher the level of initial corruption the higher the bias gets. Most of the 
bias is acquired in transition from search depth 0 to search depth 2. These 
two levels differ the most of any two consecutive levels - depth 0 means the 
algorithm goes directly to the lookup table, while on level 2 it performs 
minimaxing for the first time. Other transitions just increase the depth of 
minimaxing. 
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Figure 3. Influence of search depth on bias of the evaluation function. 



If we look closely at what is happening at the last level of minimaxing we 
can come up with an explanation why the bias occurs. On the last level we 
either have a max or a min operation. In our case we always had White to 
choose on the last level which meant a min operation. In presence of noise 
the operation of choosing a minimum value will be biased towards lower 
values. This is not saying that the value chosen will always be lower than it 
would be without noise, but in general this will be so much more often than 
not. We can thus see that the lowermost operation of the minimax algorithm 
introduces a bias. But surely the opposite operation, finding a maximum, 
which follows on the next higher level should (partly) negate this bias? It 
does not, however, because the majority of the values it operates on are 
already biased and all it does is selecting one of them. The bias is actually 
introduced on that lowest level of minimaxing. 

If we look at Figures 2 and 3 simultaneously we find out that the 
corruption level and the level of bias are highly correlated. For evaluation 
functions with a lower initial level of corruption (0.25 and 0.50) this 
correlation begins to manifest with the higher search depths of 16 to 20, 
while for others it begins much sooner, from a search depth of 10 for curve 
0.75 and from a search depth of 4 for curve 1.0. For curves with initial 
corruption higher than 1.0 the correlation is strong immediately from search 
depth 2 onward. It is no wonder then that backed-up evaluation functions 
could not get any better - they were prevented from doing so by the bias. 
However, bias, at least in general, equally affects all evaluations. It 
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resembles adding a constant to all evaluations. This in turn means that we do 
not change the order of the available moves relatively to one another in the 
position we are trying to evaluate. If this is true, then mi nimax actually does 
improve the evaluation, but on the surface it is not seen, because of the bias. 




Search vs Knowledge (with bias) 



Search depth 



Figure 4. Influence of search depth on quality of evaluation function with bias accounted for. 



To confirm this claim, we calculated another statistic, a standard 
deviation as before. However, this time we have taken into account that all 
the values were shifted away from the true values by the bias. The formula 
for this statistic, a’, is: 

a' - 

where p, and x, are again the true and backed-up value of position i, 
respectively. How this new standard deviation changes in relation with 
search depth is shown in Figure 4. Here we can see that it drastically falls 
with the increase of search depth. Again, the evaluation functions from the 
two groups defined earlier behave differently. Evaluation functions from 
group one drop and then stay more or less on the same level, while 
evaluation functions from group two first drop and then start to rise back up 
again. This positive effect of the minimax principle with increasing search 
depth on the evaluation functions from group one is exactly the result that 
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Beal was expecting from his model in 1980, but he was unable to prove it 
because the model did not account for the introduced bias. 

Up to this point, we did not say anything about how well a computer 
program using one of our corrupted evaluation functions would actually 
play. The answer is given in Figure 5. We have played out all unique KRK 
positions except the ones with special values of 0 or 255, in total 25,233 
positions. White was guided by a corrupted evaluation function. 
Additionally, White was allowed to use a simple mechanism to avoid 
repeating the same position over and over again. The mechanism kept a list 
of all positions that already occurred in the game and if the position was to 
be repeated a different move was selected (the next best move according to 
the evaluation function). Black was always playing optimally. We measured 
the quality of play as the average number of moves above what an optimal 
white player (using a non-corrupted database) would need. This statistic is 
computed as the difference between the number of moves spent by White for 
all positions and the number of moves needed for all positions using optimal 
play, divided by the number of positions (25,233). The curves representing 
play using evaluation functions with initial corruption level of 5 and 20 are 
off the scale and result in play that is not even able to mate the opponent 
within the required 50 moves. 

In Figure 5 we can see that an evaluation function with initial corruption 
level of 0.25 moves provides practically optimal play starting already at 
search depth 0. The quality of play using other evaluation functions 
gradually increases with deeper searches until it reaches a sort of threshold 
for a given evaluation function. From that point onward the quality of play 
remains more or less on the same level. This is true for evaluation functions 
with initial corruption level below 3. Those evaluation functions with initial 
corruption level higher than 3 result in a play that is not even good enough to 
mate the opponent within the required 50 moves. We can observe a 
correlation between the quality of play in Figure 5 and the knowledge 
corruption level of the evaluation functions in Figure 4. 




Search versus Knowledge: An Empirical Study of Minimax on KRK 



43 
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Figure 5. Quality of play using corrupted evaluation functions. 



4. Conclusions and Further Work 

Some theoretical studies of the minimax principle in the past have shown 
that it has a negative effect on the quality of the evaluation function. As the 
answer why it is nevertheless successful in practice they suggested two 
reasons: (a) dependencies between the true values of sibling nodes in a game 
tree, and (b) existence of traps that cause early terminations of the game. 

We have taken the opposite approach to the problem; we tried to check 
empirically these conclusions using the KRK chess endgame. We can 
confirm that the minimax algorithm appears to be a poor preserver of the 
knowledge built into the evaluation. Yet, regardless of that, it proved to be 
still successful in actual play (Figure 5). It turns out that even with 
dependencies between evaluations of sibling nodes in a game tree and an 
abundance of possibilities to commit blunders present in our endgame, the 
anomaly still existed. 

However, we claim that the minimax principle in combination with noisy 
evaluation functions introduces a bias into backed-up evaluations. This bias 
is the culprit why mathematical models could not prove the effectiveness of 
the minimax that was observed in practice. Once bias is properly accounted 
for, the positive influence of the minimax principle with increasing search 
depth is unmasked. The main problem is that bias moves the backed-up 
evaluations away from the true values, hence causing the illusion that they 
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are more corrupted. Yet, since it more or less affects all the evaluations 
equally it does not affect the relative ordering of the available moves with 
respect to their quality. So, if we look at the evaluations in absolute terms 
they are increasingly corrupted with a growing bias, but if we look at them in 
relative terms they become progressively better with higher search depths. 

In view of the presented results it would be very interesting to recheck 
our results using a more complex game - perhaps a KQKR or KRKN chess 
endgame, or some artificially designed game. A further study of how the 
bias behaves and what affects it is also necessary. Especially interesting 
would be to investigate what happens with the bias if we mix the functions 
(min and max) at the lowest level of minimaxing - some branches we search 
to even depths, others to odd depths. 
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Abstract The fact that the strong side cannot enforce a win in KNNK makes many chess 
players (both humans and computers) prematurely regard KNNKB and KNNKN 
to be trivially drawn too. This is not true, however, because there are several tricky 
mate themes in KNNKB and KNNKN which occur more frequently and require 
more complicated handling than common wisdom thinks. The text analyzes the 
mate themes and derives rules from them which allow for the static recognition 
of potential wins in KNNKB and KNNKN without further lookahead by search. 
Although endgame databases achieve the same goal, they are normally far less 
efficient at doing so because of their additional I/O and memory requirements 
(even when compressed). 

Keywords: computer chess, endgame play, KNNKB, KNNKN, static recognition 



1. Introduction 

Usually, two bare Knights are not much of a force when it comes to mating 
in late endgames such as KNNK, KNNKB, and KNNKN. It is well-known that 
these endgames are generally drawn despite the substantial material advantage 
enjoyed by the strong side (Thompson, 1991; The Editors, 1992; Nalimov, 
Haworth, and Heinz, 2000, 2001). Human chess players and chess programs 
alike tend to incorporate rules of thumb classifying bare KNN constellations 
as most unlikely to win. Thus, common chess wisdom avoids KNN types of 
positions when being ahead in material and goes for them otherwise. A crude 
way to implement the heuristic is by scoring essentially all KNNK, KNNKB, 
and KNNKN positions as draws. Like many others, an early version of our own 
chess program DarkThought (Heinz, 1997, 2000) did exactly this back in 



*This work originally started back in the mid-1990s while the author still was a Ph.D. candidate at the 
School of Computer Science, University of Karlsruhe, Germany, and then continued throughout his stay as 
a postdoctoral fellow at the M.I.T. Laboratory for Computer Science, USA, from 1999 to 2001. 
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abcdefgh abcde fgh 

Figure 3. White mates in 2 moves: Figure 4. White mates in 3 moves: 

l.Nd5! and 2. Nc7# or 2. Nb6#. l.Nc3{Nf4},2.Nd5,3.Nc7#{Nb6#}. 

mid- 1995. Then, at the end of some blitz test games, it encountered the two 
positions shown in Figures 1 and 2 where it happily went for the continuations 
leading to Figures 3 and 4 respectively, mistakenly scoring them both as draws. 
DarkThought played without endgame databases and short on time, so it 
saw the loss only after having manoeuvered itself into it. 

Of course, the aforementioned scenario with the so-called “horizon effect” 
visible at low search depths is nothing unusual in computer chess. It was quite 
special, however, that the horizon effect occurred with full severity (score drop- 





Static Recognition of Potential Wins in KNNKB and KNNKN 



47 



ping from draw to being mated) in seemingly trivial circumstances (KNNKB 
and KNNKN). This strongly aroused my curiosity and sparked the work that 
eventually led to the development of interior-node recognizers (Heinz, 1998, 

2000) , knowledgeable RAM-based endgame databases (Heinz, 1999a, 2000), 
and efficient endgame indexing (Heinz, 1999b, 2000; Nalimov et al., 2000, 

2001) plus their implementation in DarkThought. Hence, those rather 
innocent-looking two positions from Figures 1 and 2 were in fact instrumental 
for much of my endgame-related research up to date. 

Solving the KNNKN “mate in 3” of Figure 4 requires a 5-ply search with 4 
quiet half-moves before the final checkmate: White’s 1. Nc3 {Nf4} and 2. Nd5 
plus Black’s respective answers. Consequently, standard quiescence searches 
following either checks and captures or captures only cannot spot the win unless 
supported by lucky hits in the transposition table. The same holds for normal full 
searches with remaining depths of < 3 plies in case of capture-check quiescence 
and < 4 plies in case of capture-only quiescence. Therefore, the search alone 
most likely fails to resolve the mate in this simple position if it occurs far out 
near the lookahead boundary. According to Thompson (1991), The Editors 
(1992), and Nalimov et al. (2000, 2001) the endgames KNNKB and KNNKN 
feature even harder positions than the ones from Figures 3 and 4: the longest 
forced win for KNNKB is “mate in 4” (see Figure 5) and for KNNKN it is “mate 
in 7” (see Figure 6). Because of the checks and single-reply moves involved 
here, normal full searches with extensions and quiescence searches with checks 
included might actually resolve these deeper mates more easily than my two 
example positions with their many quiet moves. 




Figure 5. White mates in 4 moves: 
l.Nb6+Kb8 2. Nd7+ Ka8 3.Kc7 Bhl {or 
any other legal move by B} 4. Nb6#. 




abode fgh 

Figure 6. White mates in 7 moves: 
l.Na6+Kb7 2.Nc5+Kb8 3.Ne7Ng3 
4. Nc6+ Ka8, 5.Kc7, 6.Nd7, 7.Nb6#, 
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An obvious solution to the problem is the usage of omniscient endgame 
databases that return the exact distance-to-win (mate or conversion to another 
won subgame) when queried. In practice, this does not really work out because 
endgame databases (even in compressed format) usually reside on secondary 
storage media due to their large sizes. Thus, their querying incurs consider- 
able performance penalties and additional memory consumption for caching 
purposes. As a good compromise between accuracy and speed, most chess pro- 
grams do not query any endgame databases in the quiescence search and very 
often stop doing so a few plies above the main lookahead boundary already. 
Please note, however, that in the particular case of KNNKB and KNNKN it is 
possible to copy Nalimov’s compressed tablebases (Nalimov etal, 2000, 2001) 
to a RAM disk requiring less than 1 MB of memory and access them from there. 
Performance-wise, the necessary I/O, index calculations, and data decompres- 
sion still lose against the static recognition rules suggested by me later on in this 
text - but actually not by much. Yet, the special database setup does not allow 
for any generalization regarding other similar positions. In particular towards 
this end, I see excellent promise of the rule-based approach though. 

The remainder of this text is structured as follows. The next section discusses 
related work. Then, the subsequent sections focus on the various mate themes in 
KNNKB, KNNKN, and their subgames (namely KBKN, KNKN, and KNNK). 
These themes lead to the derivation of recognition rules and the final formulation 
of the full algorithm for the static recognition of potential wins in KNNKB and 
KNNKN. Last but not least, a wrap-up of the main findings and a look into the 
future conclude the work. 

2. Related Work 

There exists an ample body of related work covering endgame databases and 
infallible rule-based endgame play in chess. Both areas feature a long and rich 
history of interesting research. The introductory section above already referred 
to some important contributions in the field of endgame databases, namely 
(Thompson, 1991; The Editors, 1992; Nalimov etal., 2000, 2001). An elabo- 
rate discussion of endgame databases and their history was provided by Heinz 
(1999b, 2000). The introduction also mentioned the interior-node recogniz- 
ers and knowledgeable endgame databases of DarkThought (Heinz, 1998, 
1999a, 2000). Both are of special interest here because the static recognition 
rules for KNNKB and KNNKN are to augment that very recognizer framework. 

The rest of this section now focusses on infallible rule-based endgame play in 
chess. As early as 1 890, Torres y Quevedo built a marvelous electro-mechanical 
machine which played and won many of the hardest KRK positions. Tan (1972) 
implemented the first program that achieved seemingly infallible play for the 
KPK endgame. Tan’s excellent set of rules solved all difficult KPK positions 
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known by then, including Averbakh’s and Fine’s famous examples. But be- 
cause omniscient KPK endgame databases were not yet available in 1972, the 
hypothesized perfectness of the program remained unproven. However, Tan’s 
program doubtlessly pioneered the usage of decision trees with multi-valued 
nodes and leaves representing specific pattern knowledge about the respective 
endgame domain. Decision trees became an integral part of nearly all subse- 
quent works which focussed on the explicit construction or automatic deduction 
of complete rule sets for infallible endgame play in chess. Later on, Bratko, 
Kopec, and Michie (1978), Bramer and Clarke (1979), and Bratko and Michie 
(1980) presented more refined representation schemes for pattern knowledge 
in chess endgames. Several good examples of these and other predominantly 
hand-crafted rule sets are listed below. 

■ KBNK - van den Herik (1983); 

■ KNNKP(h)- Herschberg, van den Herik, and Schoo (1989); 

■ KNP(h)K - van den Herik (1980, 1982); 

■ KPK - Tan (1972), Beal (1977), Beal and Clarke (1980), Bramer (1980a, 
1980b), Niblett (1982), Barth and Barth (1992); 

■ KPKP (both P passed) - Bratko (1982), Barth (1995); 

■ KRK-TorresyQuevedo[1890],Zuidema(1974),Bramer(1980a, 1982); 

■ KRKN - Bratko and Niblett (1979), Kopec and Niblett (1980); 

■ KQKP- Barth and Barth (1992); 

■ KQKQ - Barth and Barth ( 1 992), Weill ( 1 994). 

The surprising complexity of rules and knowledge bases for “simple” prob- 
lem domains (such as 3-piece endgame databases in chess) ignited the interest of 
researchers in learning such infallible rules automatically. Especially the KRK 
and KPK endgame databases became extremely popular for automatic learning 
experiments. Many works that try to automate the inductive acquisition of rules 
for infallible endgame play in chess also employ decision trees as their central 
resources for the representation of semantic knowledge. The following brief 
overview of publications about automatic learning of infallible endgame play in 
chess and the inductive acquisition of rule-based knowledge therefor is meant 
to serve as a mere introduction to the field. Any more comprehensive summary 
clearly lies beyond the scope of this text. 

■ KBBKN (selected positions) - Muggleton (1988); 

■ KBRK (extended chess version) - Coplan (1998); 
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■ KNNKP(h) - van Tiggelen and van den Herik (1991), van Tiggelen 
(1991, 1998); 

■ KPK - Michalski and Negri (1977), Negri (1977), Shapiro and Niblett 
(1982), Shapiro (1987), Coplan (1998); 

■ KP(a7)KR - Shapiro and Michie (1986), Shapiro (1987), Muggleton 
(1990); 

■ KRK - Bain (1994), Bain and Muggleton (1994), Bain and Srinivasan 
(1995); 

■ KRKN - Quinlan (1979, 1983), Shapiro (1987), Verhoef and Wesselius 
(1987). 

3. Checkmates in KBKN and KNKN 

Although the subgames KBKN of KNNKB and KNKN of KNNKN are 
trivially drawn, they actually do feature a few checkmate positions. These are 
shown in Figure 7 where each quadrant of the board depicts its own mate theme 
to be viewed independently of the others. Yet, normal non-mate positions in 
KNKB and KNKN do never forcibly lead to the side on move being mated (i.e., 
there are only direct “mates in 1” because the side on move can always evade 
any mating attempt). Still, the mate themes of Figure 7 demand proper attention 
because they also apply to KNNKB and KNNKN where both the strong as well 
as the weak side might mate the opponent accordingly. The static recognition 
rules must take all these possibilities into full account. 

8 
7 
6 
5 
4 
3 
2 
1 



Figure 7. Mate themes in KBKN and KNKN (comer traps, not enforceable). 
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4. Checkmates in KNNK 
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abcdefgh abcdefgh 

Figure 8. Mate themes in KNNK (comer traps, not enforceable). 




abcde fgh 



Figure 9. Another mate theme in KNNK (edge trap, not enforceable). 

Despite the considerable material advantage of the strong side, the subgame 
KNNK of KNNKB and KNNKN is generally drawn too (Thompson, 1991; 
The Editors, 1992; Nalimov et al., 2000, 2001). Again, there are no enforce- 
able checkmates in KNNK but the number and variety of mating themes and 
direct mates are much higher here than in the materially balanced subgames 
KBKN and KNKN. Figure 8 visualizes whole sets of mate themes in KNNK by 
showing several alternative locations of the strong King together with a fixed 
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placement of the two Knights and the weak King in the same single quadrant. 
The checkmate positions in each such set differ solely by the location of the 
strong King. Like the mate themes of KBKN and KNKN, these KNNK check- 
mates all involve trapping the weak King in a comer of the board. In addition 
to the comer traps, there is another special mate theme in KNNK that works by 
trapping the weak King on the edge of the board away from the comer. Figure 9 
depicts this additional mate theme for one possible location of the weak King 
on the edge in the upper half of the board. The theme remains valid when shift- 
ing it to the left or right within the bounds of the board, of course. Although 
not being actively enforceable within KNNK, all the mate themes of Figures 8 
and 9 exemplify potential wins by the strong sides in KNNKB and KNNKN. 
Therefore, the static recognition rules must also cover them properly. 

5. Checkmates in KNNKB and KNNKN 

Those readers who are still not convinced that KNNKB and KNNKN po- 
sitions deserve better than being scored as some kind of draw might finally 
reconsider after taking a look at the following numbers found in Nalimov’s 
tablebase summary files (Nalimov et al., 2000, 2001). Roughly 10,000 posi- 
tion templates in KNNKB and 40,000 position templates in KNNKN are won 
for the KNN side. The vast majority of them are non-direct forced wins re- 
quiring several moves to mate. Including symmetries, the real numbers of won 
positions for the KNN side amount to 4x - 8x as many: i.e., 40,000 to 80,000 
in KNNKB and 160,000 to 320,000 in KNNKN. 

Compared with the 300 to 600 forced wins of KNNK (all direct mates), there 
are orders of magnitude more forced wins in KNNKB and KNNKN where the 
weak side features a minor piece in addition to the King. Hence, the KNN side 
is actually better at enforcing checkmate if the opponent defends itself with 
more material than just a lone King. As counter-intuitive as this might seem 
at first glance, it is quite well-known and not too hard to understand because 
the additional piece prevents stalemates and may even block an escape route 
of the weak King. 1 However, the added material also enables the weak side 
to mate the opponent in some non-enforceable circumstances brought about by 
bad play of the strong side. Consequently, KNNKB and KNNKN contain some 
positions where the weak side wins and the KNN side is mated. 

5.1 Weak Side Wins 

The mate themes of all subgames still apply in KNNKB and KNNKN as 
well, with the excess piece (a strong Knight) located anywhere else on the 



'The famous forced wins in < 7 moves with a single Knight against a Pawn in KNKP(a,h) exploit the very 
same strategy. 
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Figure 10. Additional mate themes for weak side in KNNK[B,N] (not enforceable). 



board in legal fashion. If the second Knight of the KNN side also resides 
directly beside the strong King trapped in a comer, the noteworthy additional 
mate themes shown in Figure 10 arise. The static recognition rules must take 
all such possibilities into full account. 

5.2 KNN Side Wins 




abcdefgh 




Figure 11. Mate themes without King support in KNNK[B,N] (not enforceable). 



As before, the mate themes of all subgames apply in KNNKB and KNNKN 
too. On top of these, the KNN side may now mate the opponent even without 
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any support of its own King. Figure 1 1 presents the according NN-checkmates 
which are quite exceptional and not enforceable. Other additional mate themes 
of KNNKB involving the full set of 5 pieces on the board are shown in Figures 12 
and 13 (comer traps) and Figure 14 (edge traps). This overview of positions 
with the KNN side winning is by no means exhaustive. But due to space 
limitations, the remaining positions won by the strong side in KNNKB cannot 
be shown here. Unfortunately, the very same holds for all additional KNNKN 
mate themes and positions won by the strong side there. Nevertheless, the static 
recognition rules must of course cover them all in a suitable way too. 




Figure 12. Additional mate themes for strong side in KNNKB (I). 




Figure 13. Additional mate themes for strong side in KNNKB (II). 
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abcdefgh abcdefgh 
Figure 14. Additional mate themes for strong side in KNNKB (III). 



6. Static Recognition Rules 

The preceding sections on checkmates in KNNKB and KNNKN plus all 
their subgames (KBKN, KNKN, KNNK) argue that all possible mate themes in 
these endgames involve trapping the enemy King in either the comer or on the 
edge of the board. The omniscient endgame databases confirm this notion but 
their exhaustive querying also reveals some forced wins for the KNN side in 
KNNKN where the weak King resides on one of the “extended comer” squares 
of the board, namely b2, b7, g2, and g7. Figure 15 shows such a position 
which arises from the forced win in 7 moves of Figure 6 after 1 . Na6+ Kb7. 




abcdefgh 
Figure 15. White mates in 6 moves - see Figure 6 after 1. Na6+ Kb7. 
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The ensuing motif how to enforce the final checkmate does not work against a 
defending Bishop. Hence, there are no forced wins with the weak King located 
on “extended comer” squares in KNNKB. 

Strong- Win Potential. The winning chances of the strong crucially hinge on 
its ability to keep the weak King trapped on the edge and, in case of 
KNNKN, the “extended comer squares of the board. Success in doing so 
is quite tedious to determine exactly because of possible checks, attacks 
on the strong Knights, and even pins by the weak Bishop in KNNKB. 

NN-Mate Rule. If the weak King is located in a comer of the board with its 
Bishop or Knight directly beside it on an “extended comer” square and 
a strong Knight trapping it from the next square on the long diagonal, 
then the special mate themes of Figure 1 1 loom. They do not require any 
direct support by the strong King. So, the position is a guaranteed win 
for the KNN side if the other strong Knight already gives a check or is on 
move and able to deliver a direct check (in KNNKN this holds even if the 
strong King is currently in check itself). Otherwise, the position is drawn 
in KNNKB if the weak side is on move or the strong side is in check 
because then the weak Bishop can capture a Knight (see Figure 1 1). 

Weak-Draw Rule. If the weak King does not reside on the edge of the board 
and not on any “extended comer” square in case of KNNKN either, then 
the weak side at least draws. The same holds if the weak side is on move 
and the weak King can directly step off the edge and the “extended comer” 
in case of KNNKN. If the distance between the two Kings exceeds 4 steps 
measured in squares on the board, the position is drawn too as discovered 
by exhaustive analyses of the endgame databases KNNKB and KNNKN. 
Depending on the side-to-move and whether it is a KNNKB or KNNKN 
position, the distances between the two Kings triggering a draw are even 
smaller (see recognition algorithm below for more details). 

Weak- Win Rule. If the strong King is located in a comer of the board with at 
least one of its Knights directly beside it on the edge of the board and the 
weak King covers the “extended comer” square next to the strong King, 
then the weak side might even win whereas the strong side at most draws. 
If so and the strong side is on move but not checkmated, then the position 
is drawn. If so and the weak side is on move but cannot directly check 
and mate the opponent, then the position is drawn as well. 
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7. Static Recognition Algorithm 



Constant and Type Declarations 

TYPE boardstate = ... ; /* state of a given position on chess board */ 

TYPE score = . . . ; /* range of valid scores */ 

TYPE side = ENUM {black, white}; 

TYPE square = ENUM {al, . . . , hi, . . . , a8, . . . , h8>; 

const SET OF square: corner = {al, hi, a8, h8>; 

const SET OF square: edge = {al, ..., hi, a2, h2, ..., a7, h7, a8, ..., h8}; 

const SET OF square: xcorner = {b2, g2, b7, g7}; 



KNNK[B,N] Recognition Function 

FUNC score knn_k_b_n_recog( const boardstate: pos; const side: strong, weak) { 



const square : 
const SET OF square: 
const SET OF square: 
const SET OF square: 
const square : 
const SET OF square: 
const square : 

/***** WEAK-WIN PART 



strong_k = 
strong_k_area = 
strong.nn = 
weak_b_n = 
weak_k = 
weak_k_area = 
weak_minor = 

*****/ 



k_sqr(strong, pos); 
k_attck(strong_k) ; 
n_sqrs (strong, pos) 
b_sqrs(weak, pos) + 
k_sqr(weak, pos); 
k_attck(weak_k) ; 
ANYELEM (weak_b_n) ; 



n_sqrs(weak, pos); 



IF (strong_k IN corner) /* strong K trapped by own Ns and weak K with */ 
kk EMPTY (strong_k_ area - strong_nn - weak_k_area) /* no escape */ 

{ 

const SET OF square: b_mates = xcorner * strong_k_area; /* target */ 
const SET OF square: n_mates = n_attck(strong_K) ; /* squares for */ 

/* B, N to mate strong K */ 

IF (side_to_move(pos) == strong) 

kk ((is_knnkb(pos) kk ! EMPTY (weak_b_n * b_mates)) 

II (is.knnkn(pos) kk ! EMPTY(weak_b_n * n.mates))) 

RETURN stm_mated_score(pos) 

ELSE IF ((side_to_move(pos) == weak) /* weak side on move may mate */ 
kk EMPTY (n_attck(weak_k) * strong_nn) /* if not in check */ 

&& ((is_knnkb(pos) && ! EMPTY (b_attck(weak_minor, pos) * b^mates)) 

I I (is.knnkn(pos) && ! EMPTY (n_att ck (weak.minor) * n.mates)))) 
RETURN stm_mates_score(pos) ; 

ELSE 

RETURN draw_score(pos) ; /* otherwise, position is drawn */ 

} 

/***** WEAK-DRAW PART (I) *****/ 

/* drawn if weak K not on edge */ 
IF ! (weak_k IN edge) && !((weak_k IN xcorner) && isjmnkn(pos) ) /* and */ 

RETURN draw_score(pos) ; /* not on "extended corner" in KNNKN */ 

/***** NN-MATE PART *****/ 



IF (weak.k IN corner) /* weak K in corner trapped by own B, N on */ 

kk ! EMPTY ( we ak_k_area * xcorner * weak_b_n) /* "extended corner" */ 
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&& ! EMPTY (strong.nn * {c3, f3, c6, f6> * k_attck(weakjninor) ) 

{ /* and by strong N in diagonal opposition */ 

IF ! EMPTY (strong.nn * n_attck(weak_k) ) /* weak K also in check by */ 

RETURN stm_mated_score(pos) ; /* 2nd strong N ==> checkmate! */ 

IF isjmnkb(pos) && ( (side_to_move(pos) == weak) II (strong_k IN 

(k_attck(weak_minor) * {cl, fl, a3, h3, a6, h6, c8, f8»)) 
RETURN draw_score(pos) ; /* drawn in KNNKB if weak side on move */ 

/* or strong side in check */ 

IF (side_to_move(pos) == strong) && ! EMPTY (n_attck(weak_k) * n_attck( 
ANYELEM ( str ong_nn - {c3, f3, c6, f6> * k_attck(weak_minor) )) ) 
/* strong side on move and other strong N ready to deliver mate */ 
RETURN (is_knnkb(pos) II ! (strong_k IN n_attck(weak_minor))) 

? stm_mates_score(pos) : stm_mates_next_score(pos) ; 

RETURN rcg_fail_score(pos) ; /* weak side on move in KNNKN ==> */ 

} /* may still draw (unwind the trap by removal of N) */ 

/***** WEAK-DRAW PART (II) *****/ 

/* drawn if K distance > 4 steps */ 
IF sqr_dist (strong_k, weak_k) > 4 RETURN draw_score(pos) ; 

IF side_to_move(pos) == weak 

{ /* calculate escape squares of weak K */ 

const SET OF square: esc.area = weak_k_area - weak_b_n - strong_k_area 
- n_attck(FIRSTELEM(strong_nn)) - n_attck(LASTELEM(strong_nn)) ; 

IF !EMPTY(esc_area - edge - (is_knnkn(pos) ? xcorner)) /* weak K */ 
RETURN draw_score(pos) ; /* can escape from trap ==> draw */ 

} 

/***** STRONG-WIN PART ****♦/ 

RETURN rcg_fail_score(pos) ; /* handle tricky issues by further search */ 

y /* and trigger an extension in this line */ 



7.1 Algorithm Description 

Auxiliary Functions. The recognition algorithm relies on several auxiliary 
functions not specified in detail here. There are a number of routines to ac- 
cess and query the current state of the chess board passed in the parameter 
pos of type boardstate: k_sqr returns the King location of the desired 
side; b_sqrs and n_sqrs return the locations of all Bishops and Knights 
respectively for the desired side; isJtnnkn and is_knnkb identify the 
exact material balance; and side _to_move returns the side on move in the 
given position. Another group of auxiliary functions handles the encod- 
ing of recognizer failures, checkmates, draws, and mates in this or the next 
move after it into valid scores: rcg_f ail_score, stm_mated_score, 
draw_score, stm_mates_score, and stm_mates_next_score. The 
numerical function sqr_dist returns the distance between two squares 
on the board as measured in single-square steps that a King needs in 
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moves on an empty board to travel from one to the other. Last but not 
least, the algorithm requires support for the calculation of sets of squares 
attacked by Bishops, Kings, and Knights located anywhere on the board. 
The functions b_attck, k_attck, and n_attck perform the according 
attack generations for B, K, and N respectively. The sliding coverage 
of Bishops along their diagonals specifically depends on the full board 
state, whereas Kings and Knights always attack the same sets of squares 
from a given location regardless of any other pieces. 

Constants and Types. The constant sets corner, edge, and xcorner cap- 
ture the important comer, edge, and “extended comer” squares of the 
chessboard. The enumeration type side contains just two items: black 
and white. The enumeration type square covers all board squares de- 
noted by the 64 items al, ..., hi, . . . a8, ..., h8. The anonymous types 
boardstate and score represent the full states of chess positions and 
scoring values respectively. 

Pattern Recognition. The algorithm applies basic set operations on sets of 
squares to achieve location-indepedent pattern recognition. As an exam- 
ple take the core NN-mate pattern of the weak King, in any comer, the 
weak Bishop or Knight directly beside it on the corresponding “extended 
comer” square, and one of the strong two Knights diagonally beside the 
weak minor piece as depicted in Figure 1 1 . The membership test weak_k 
IN corner assures that the weak King resides in a comer. Then, the inter- 
section weak _k_area * xcorner * weak_b_n gives the set of “extended 
comer” squares with a weak Bishop or Knight directly beside the weak 
King. If the set is not empty, it contains the square of the weak minor piece 
as a single element and the second pattern condition holds. Finally, inter- 
secting k_attck(weak_minor) * {c3, f3, c6, f6} * strong_nn com- 
putes the set of squares with strong Knights directly and inwardly beside 
the weak minor. If this set is not empty, the full core pattern is identified 
independent of the specific comer square the weak King is located on. 

Weak- Win Part. The recognition starts with the exceptional wins by the weak 
side where the strong King is trapped in a comer by at least one of its 
Knights and the weak King. Depending on which side is on move and 
whether the weak minor piece can actually deliver a checkmate, the algo- 
rithm returns mate or mated scores and a draw score otherwise. A clever 
trick used here to determine if a single square is attacked by any piece 
from a set of like pieces works as follows: call the specific attack function 
of the given piece type with the very square in question as the location 
parameter, then intersect the resulting attack squares with the original set 
of like pieces — > if and only if the intersection is not empty, the square 




60 



E.A. Heinz 



in question is under attack by some piece from the set of like ones. This 
scheme excels at check detection. The term EMPTY (n_attck(weak_k) 
* strong_nn) , for instance, assures that the weak King is not in check 
by any of the strong Knights. 

Weak-Draw Part (I). This straightforward section detects draws by the rule 
that the weak King is not on the edge of the board and not on any “extended 
comer” squares in KNNKN either. 

NN-Mate Part. The paragraph on pattern recognition above already discussed 
the core NN-mate pattern and its recognition in detail. After establishing 
that the core NN-mate pattern applies, the algorithm tests for check- 
mate by the second strong Knight attacking the weak King, for draws in 
KNNKB with the weak side on move or the strong side in check, and for 
forced mates by the strong side with the second strong Knight ready to 
deliver the final check. Otherwise, the weak side is on move in KNNKN 
and may still draw by removing the weak Knight from the “extended cor- 
ner” square, thus unwinding the trap. The static recognizer intentionally 
fails at this point in order to resolve the resulting complications of checks 
and Knight forks by further search. 

Weak-Draw Part (II). First, the algorithm detects draws by the rule “Kings 
more than 4 steps apart”. Then, the next draw detection deals with the 
case that the weak side is on move and may directly step off the edge and 
the “extended comer” in case of KNNKN. The available escape squares 
of the weak King are those squares around it not blocked by the weak 
minor piece and not attacked by either the strong King or its Knights. If 
the set difference of these escape squares and the edge of the board (plus 
the “extended comer” squares in case of KNNKN) is not empty, then the 
weak King directly escapes from the trap and the position is drawn. 

Strong- Win Part. Whenever no obvious drawing rale for the weak side ap- 
plies, the static recognizer fails. In case of KNNKN, the weak King still 
seems to be trapped on the edge of the board or the “extended comer” 
squares. Further search then resolves the tricky issues of possible checks, 
attacks on the strong Knights, and pins of the weak Bishop in KNNKB. 
In general, such explicitly intended failures of static recognizers should 
trigger search extensions in the current line. If so desired, more ambitious 
analyses of the piece constellation and attack relations on the board aim- 
ing for an even better identification of real wins in KNNKB and KNNKN 
may easily be added in front of the fail- value return at the end. 




Static Recognition of Potential Wins in KNNKB and KNNKN 



61 



7.2 Algorithmic Complexity 

The recognition algorithm heavily depends on sets of squares and basic op- 
erations on them: set difference, element count, emptiness, intersection, mem- 
bership, member selection, and union. Other important auxiliary functions are 
those for attack generation and access to the data structure holding the full state 
of the current board position. 

Sets of Squares. There are 64 squares on a chess board. Hence, the best way to 
handle sets of squares is by means of a standard bit- vector representation 
with exactly 64 bits (one for each square) where square i is in the set if 
and only if the i-th bit of the vector is 1. Thus, sets of squares nicely map 
to 64-bit unsigned integers which are natural data types of modem CPUs. 
In computer chess such 64-bit values are also known as “bitboards”. 

Basic Set Operations. For sets represented as bit vectors, all basic set op- 
erations map to simple constant-time computations involving unsigned 
64-bit data: difference — > bit-wise AND complement, element count — > 
count bits (a.k.a. population count), emptiness — > compare with 0, inter- 
section — > bit-wise AND, membership — > test bit, selection — > find bit, and 
union —* bit-wise OR. Most of these computations actually finish within 
a single clock cycle on modem CPUs. The 64-bit unsigned integer value 
0 represents the empty set and comparisons for set equality are done by 
standard tests comparing 64-bit unsigned integer values. 

Attack Generation. The squares attacked by Kings and Knights depend on 
their specific locations only, regardless of the placement of any other 
pieces. Straightforward table lookups indexed by square numbers suf- 
fice to perform the according attack calculations k_attck and n_attck. 
Bishops, on the other hand, are sliding pieces that depend on the full board 
constellation to determine the exact extent of their attack coverage. Even 
if implemented by looping over squares in the four diagonal directions, 
the respective attack calculations of b_attck are constant-time bound 
because their are at most 13 squares to traverse (7 on the middle diagonal 
of the board and another 6 on one next to the middle). Moreover, so- 
called “rotated bitboards” (Hyatt, 1999; Heinz, 1997, 2000) enable the 
full Bishop attack calculations to be done by a few table lookups. 

Remaining Auxiliary Functions. Except for attack generation, the auxiliary 
functions either encapsulate simple access protocols to the data structure 
carrying the current state of the chess board or they perform equally 
simple score value encodings. All these computations are constant-time 
bound and take only a few clock cycles to finish on modem CPUs. The 
same holds for sqr.dist, an auxiliary function not covered up to now: 

sqr.dist(x,y) = MAX( ABS(VAL(x)/8 - VAL(y)/8), ABS(VAL(x)"/,8 - VAL(y)%8) ). 
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All in all, the recognition algorithm contains only constant-time bound com- 
putations and no loops. Hence, it is of constant time complexity in 0(1). As the 
average and longest execution paths through the algorithm are short and most 
of the calculations actually finish within a few clock cycles on modem CPUs, 
the whole algorithm also features good efficiency in practice where acceptably 
small constants cap its average and worst-case execution times. 

8. Conclusion and Future Work 

Hundreds of thousands of positions in KNNKB and KNNKN are won for 
the KNN side. Tricky mate themes occur more frequently and require more 
complicated handling in these two endgames than common wisdom makes 
people think. In fact, they are not trivial at all! This paper may very well be the 
first ever to present a rale-based static recognition algorithm for any complete 
non-trivial 5-piece endgame because the fine works by Herschberg etal. (1989), 
van Tiggelen and van den Herik (1991), and van Tiggelen ( 1 99 1 , 1 998) consider 
only the subset KNNKP(h) of the full KNNKP endgame. 

All mate themes and rules were developed a-priori by hand. Then, later on, 
their validity was checked against omniscient endgame databases a-posteriori. 
In particular, the “trapped King” feature seems very important and powerful 
for endgames in general and is probably good for static recognition in other 
endgames as well. Such trapping and the number of escape squares for each 
King could possibly be used as a crucial position feature and input parameter for 
machine-learning algorithms that try to extract useful knowledge from endgame 
databases automatically. The trap patterns look interesting for chess problem 
composers, too, who have certainly discovered them on their own already. 

In the future, I like to use the KNNKB and KNNKN recognition rules as a 
foundation to statically detect possible draws and “mates in X” in other positions 
not covered by endgame databases directly (e.g., additional material might not 
save Black in Figure 3). Moreover, one can still extend the current algorithm 
to include better static mate detection and further knowledge about enforceable 
“mate in X” positions. It is also possible to down-scale and specifically adapt the 
algorithm for the subgames KBKN, KNKN, KNNK, and the endgame KB KB. 

References 

Bain, M. and Srinivasan, A. (1995). Inductive Logic Programming with Large-Scale Unstructured 
Data. Machine Intelligence 14, K. Furukawa, D. Michie, and S. Muggleton (eds.), pp. 233- 
267, Oxford University Press. 

Bain, M. (1994). Learning Logical Exceptions in Chess. Ph.D. Thesis, University of Strathclyde 
[printed as Thesis 7866, Dept, of Statistics and Modelling Science, University of Strathclyde]. 
Bain, M. and Muggleton, S. (1994). Learning Optimal Chess Strategies. Machine Intelligence 13, 
K. Furukawa, D. Michie, and S. Muggleton (eds.), pp. 291-309, Oxford University Press. 
Barth, W. (1995). Combining Knowledge and Search to Yield Infallible Endgame Programs. 
ICCA Journal, Vol. 18, No. 3, pp. 149-159. 




Static Recognition of Potential Wins in KNNKB and KNNKN 



63 



Barth, W. and Barth, S. (1992). Validating a Range of Endgame Programs. ICC A Journal , Vol. 15, 
No. 3, pp. 132-139. 

Beal, D.F. and Clarke, M.R.B. (1980). The Construction of Economical and Correct Algorithms 
for King and Pawn against King. Advances in Computer Chess 2, M.R.B. Clarke (ed.), pp. 1- 
30, Edinburgh University Press. 

Beal, D.F. (1977). Discriminating Wins from Draws in King+Pawn versus King Chess Endgames. 
Unpublished Report, Queen Mary College. 

Bramer, M.A. (1982). Refinement of Correct Strategies for the Endgame in Chess. SIGART 
Newsletter, Vol. 80, pp. 155-163 [reprinted in Computer Game-Playing: Theory and Practice, 
M.A. Bramer (ed.), pp. 106-124, 1983, Ellis Horwood], 

Bramer, M.A. (1980a). Correct and Optimal Strategies in Game-Playing Programs. Computer 
Journal, Vol. 24, No. 4, pp. 347-352. 

Bramer, M.A. (1980b). An Optimal Algorithm for King and Pawn against King using Pattern 
Knowledge. Advances in Computer Chess 2, M.R.B. Clarke (ed.), pp. 82-91, Edinburgh 
University Press. 

Bramer, M.A. and Clarke, M.R.B. (1979). A Model for the Representation of Pattern-Knowledge 
for the Endgame in Chess. Inti. Journal of Man-Machine Studies, Vol. 1 1 , No. 5, pp. 635-649. 

Bratko, I. (1982). Knowledge-Based Problem Solving in AL3. Machine Intelligence 10, J.E. 
Hayes, D. Michie, and Y.-H. Pao (eds.), pp. 73-100, Ellis Horwood. 

Bratko, I. and Michie, D. (1980). A Representation for Pattern Knowledge in Chess Endgames. 
Advances in Computer Chess 2, M.R.B. Clarke (ed.), pp. 31-56, Edinburgh University Press. 

Bratko, I. and Niblett, T. (1979). Conjectures and Refutations in a Framework for Chess Endgame 
Knowledge. Expert Systems in the Micro-Electronic Age, D. Michie (ed.), pp. 83-102, Edin- 
burgh University Press. 

Bratko, I., Kopec, D., and Michie, D. (1978). Pattern-Based Representation of Chess End-Game 
Knowledge. Computer Journal, Vol. 21, No. 2, pp. 149-153. 

Coplan, K.P. (1998). Synthesis of Chess and Chess-like Endgames by Recursive Optimization. 
ICCA Journal, Vol. 21, No. 3, pp. 169-182. 

Heinz, E.A. (2000). Scalable Search in Computer Chess. Vieweg. 

Heinz, E.A. (1999a). Knowledgeable Encoding and Querying of Endgame Databases. ICCA 
Journal, Vol. 22, No. 2, pp. 81-97. 

Heinz, E.A. (1999b). Endgame Databases and Efficient Index Schemes for Chess. ICCA Journal, 
Vol. 22, No. 1, pp. 22-32. 

Heinz, E.A. (1998). Efficient Interior-Node Recognition. ICCA Journal, Vol. 21, No. 3, pp. 156— 
167. 

Heinz, E.A. (1997). How DarkThought Plays Chess. ICCA Journal, Vol. 20, No. 3, pp. 
166-176. 

Herik, H.J., van den (1983). Representation of Experts’ Knowledge in a Subdomain of Chess 
Intelligence. 8th International Joint Conference on Artificial Intelligence, Proceedings Vol. I, 
A. Bundy (ed.), pp. 252-255, Kaufmann. 

Herik, H.J., van den (1982). Strategy in Chess Endgames. SIGART Newsletter, Vol. 80, pp. 145- 
154 [reprinted in Computer Game-Playing: Theory and Practice, M.A. Bramer (ed.), pp. 87- 
105, 1983, Ellis Horwood]. 

Herik, H.J., van den (1980). Goal-Directed Search in Chess Endgames. Delft Progress Report, 
Vol. 5, No. 4, pp. 253-279, Delft University of Technology [reprinted in Computer Chess 
Compendium, D.N.L. Levy (ed.), pp. 316-329, Springer, 1989]. 

Herschberg, I.S., van den Herik, H.J., and Schoo, P. N. A. (1989). Verifying and Codifying 
Strategies in the KNNKP(h) Endgame. ICCA Journal, Vol. 12, No. 3, pp. 144-154 [reprinted 
in Computers, Chess, and Cognition, T.A. Marsland and J. Schaeffer (eds.), pp. 183-196, 
Springer, 1990]. 




64 



E.A. Heinz 



Hyatt, R.M. (1999). Rotated Bitmaps, a New Twist on an Old Idea. ICCA Journal , Vol. 22, No. 4, 
pp. 213-222. 

Kopec, D. and Niblett, T.B. (1980). How Hard is the Play of the King-Rook-King-Knight Ending? 
Advances in Computer Chess 2, M.R.B. Clarke (ed.), pp. 57-81, Edinburgh University Press. 

Michalski, R. S. and Negri, P. ( 1 977). An Experiment on Inductive Learning in Chess End Games. 
Machine Intelligence 8 , E.W. Elcock and D. Michie (eds.), pp. 175-192, Ellis Horwood. 

Muggleton, S. H. (1990). Inductive Acquisition of Expert Knowledge. Turing Institute Press. 

Muggleton, S. (1988). Inductive Acquisition of Chess Strategies. Machine Intelligence 11, J.E. 
Hayes, D. Michie, and J. Richards (eds.), pp. 375-389, Oxford University Press. 

Nalimov, E.V., Haworth, G.McC., and Heinz, E.A. (2001). Space-Efficient Indexing of Endgame 
Tables for Chess. Advances in Computer Games 9, H.J. van den Herik and B. Monien (eds.), 
pp. 93-1 13, Institute for Knowledge and Agent Technology, University of Maastricht. 

Nalimov, E.V., Haworth, G.McC., and Heinz, E.A. (2000). Space-Efficient Indexing of Chess 
Endgame Tables. ICGA Journal, Vol. 23, No. 3, pp. 148-162. 

Negri, P. (1977). Inductive Learning in a Hierarchical Model for Representing Knowledge in 
Chess End Games. Machine Intelligence 8, E.W. Elcock and D. Michie (eds.), pp. 193-204, 
Ellis Horwood. 

Niblett, T.B. (1982). A Provably Correct Advice Strategy for the End-Game of King and Pawn 
versus King. Machine Intelligence 10, J.E. Hayes, D. Michie, and Y.-H. Pao (eds.), pp. 101— 
120, Ellis Horwood. 

Quinlan, J.R. (1983). Learning Efficient Classification Procedures and Their Application to 
Chess End Games. Machine Learning: An Artificial Intelligence Approach, R.S. Michalski, 
J.G. Carbonnell, and T.M. Mitchell (eds.), pp. 463^182, Tioga [reprinted by Springer, 1983]. 

Quinlan, J.R. (1979). Discovering Rules by Induction from Large Collections of Examples. Ex- 
pert Systems in the Micro-Electronic Age, D. Michie (ed.), pp. 168-201, Edinburgh University 
Press. 

Shapiro, A.D. (1987). Structured Induction in Expert Systems. Turing Institute Press. 

Shapiro, A.D. and Michie, D. (1986). A Self-Commenting Facility for Inductively Synthesized 
Endgame Expertise. Advances in Computer Chess 4, D.F. Beal (ed.), pp. 147-165, Pergamon. 

Shapiro, A.D. and Niblett, T.B. (1982). Automatic Induction of Classification Rules for a Chess 
Endgame. Advances in Computer Chess 3, M.R.B. Clarke (ed.), pp. 73-92, Pergamon. 

Tan, S.T. (1972). Representation of Knowledge for Very Simple Endings in Chess. Memorandum 
MIP-R-98, School of Artificial Intelligence, University of Edinburgh. 

The Editors. (1992). Thompson: All about Five Men. ICCA Journal, Vol. 15, No. 3, pp. 140-143 . 

Thompson, K. (1991). Chess Endgames Volume 1. ICCA Journal, Vol. 14, No. 1, p. 22. 

Tiggelen, A., van (1998). Heuristic Search Methods in Parameter Space. Engineering Bureau 
van Tiggelen. 

Tiggelen, A., van (1991). Neural Networks as a Guide to Optimization. ICCA Journal, Vol. 14, 
No. 3, pp. 115-118. 

Tiggelen, A., van and van den Herik, H.J. (1991). Alexs: An Optimization Approach for the 
Endgame KNNKP(h). Advances in Computer Chess 6, D.F. Beal (ed.), pp. 161-177, Ellis 
Horwood. 

Verhoef, T.F. and Wesselius, J.H. (1987). Two-Ply KRKN: Safely Overtaking Quinlan. ICCA 
Journal, Vol. 10, No. 4, pp. 181-190. 

Weill, J.-C. (1994). How Hard is the Correct Coding of an Easy Endgame? Advances in Computer 
Chess 7, H.J. van den Herik, I.S. Herschberg, and J.W.H.M. Uiterwijk (eds.), pp. 163-175, 
University of Limburg. 

Zuidema, C. (1974). Chess: How to Program the Exceptions ? Technical Report IW21/74, De- 
partment of Informatics, Mathematical Center Amsterdam. 




MODEL ENDGAME ANALYSIS 



G.M c C. Haworth, R.B. Andrist 
guy_haworth @ hotmail .com ; rba_schach @ gmx.ch 



Abstract A reference model of Fallible Endgame Play has been implemented and exer- 
cised with the chess engine Wilhelm. Various experiments have demonstrated 
the value of the model and the robustness of decisions based on it. Experimen- 
tal results have also been compared with the theoretical predictions of a 
Markov model of the endgame and found to be in close agreement. 

Keywords: chess, endgame, experiment, fallibility, Markov, model, theory 



1. Introduction 

In Haworth (2003), a reference model of fallible endgame play was de- 
fined in terms of a spectrum of Reference Endgame Players (REPs) R c . The 
REPs are defined as choosing their moves stochastically, using only succes- 
sor positions’ values and depths from an endgame table (EGT). Exploring 
here the parameters of the model and various opponent-sensitive uses of the 
REPs including choice of move, we report on: 

a) the robustness of decisions based on the model, 

given that various parameters of the model may be changed, 

b) the apparent competence of reference player R 20 in an R 20 - R~, match, 

c) the distribution of game lengths in that match versus Markov theory, 

d) the probability of beating a 50-move draw claim versus Markov theory, 

e) the apparent competence of carbon and silicon players over the board. 

In Section 2, we revisit the basic concepts and theory of the REP model, 
while in Section 3, we describe the REP implementation in WILHELM (An- 
drist, 2003). In Sections 4 to 7, we focus on the five topics above. Section 8 
summarises and notes some questions arising from this work. 
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2. The Reference Endgame Player Model 



A nominated endgame, e.g., chess’ KQKR, is considered to be a system 
with a finite set of states {s^ numbered from 0 to ns-l. 1 Each state siyal, d) 
is an equivalence class of positions of the same theoretical value val and 
depth d. Higher-numbered states are assumed to be less attractive to the side 
to move, which is taken to be White. Thus, for KQKR with the DTC 2 metric, 
we have maxDTCs (1-0) n w =31, (0-1) n B = 3, and ns = 37 states in total: 

- s h i = 0: a 1-0 win, i.e., for White, not requiring a winner’s move 3 , 

- s h 1 < i < 31: 1-0 wins of depth i, 

- Si, i = 32: theoretical draw, either in the endgame or a subgame, 

- s^ 33 < i < 35: 0-1 wins, i.e., for Black, of depth 36 -i 

- s^ i = 36: a 0-1 win not requiring a winner’s move. 



The REP R c in position P chooses stochastically from moves which each 
have a probability proportional to a Preference 4 , S c (yal s , d s ), where s is the 
move’s destination state with theoretical value val s and win/loss depth d s . 
Each move-choice by R c is independent of previous move-choices. 

We require that {/? c } is a spectrum of players, ranging linearly from the 
metric -infallible player /?«, via the random player R 0 to the anti-infallible 
player R.^ To ensure this, the function S c (val, d) is required to meet some 
natural criteria, as described more fully and formally in Haworth (2003) and 
in Appendix B. 

Here, we choose, as an S c (val, d) function meeting those criteria: 

S c (win, d) =(d + k)' c with k> 0 to ensure that S c is finite, 

S c (draw ) = S c (win, ft;) = S c (loss, ^2) with ft; > ft w and ft2 > n B , 

ScQoss, d) = X-(d + k) c , X being defined by ft; and ft2 above. 



This ensures, as required, that R 0 prefers no move to any other, that R c 
with c > 0 prefers better moves to worse moves, and that as c— o, the R c in- 
crease in competence and tend to infallibility in terms of the chosen metric. 

Although the R c have no game-specific knowledge, the general REP 
model allows moves to be given a prior, ancillary, weighting v m based on 
such considerations (Jansen, 1992). Thus, v m = 0, as used in this paper, pre- 
vents a move being chosen and v m > 1 makes it more likely to be chosen. 

The probability T c (i) of moving to state s t is therefore: 



TJS) — ' Smoves_to_.s’tate_/ Vm ! Sall_moves V m m S c (^S move ) 



1 For convenience, Appendix A summarises the key acronyms, notation, and terms. 

2 DTC = DTC(on version) = Depth to Conversion, i.e., to mate and/or change of material. 

3 i.e., mate, achieved conversion to won subgame, or loser forced to convert on next move. 

4 For convenience and clarity, the Preference Function S c (yal s , d s ) may be signified by the 

more compact notations S c (val, d) or merely S c (s) if the context allows. 




Model Endgame Analysis 



67 



3. Implementing the REP Model 

The second author has implemented in WILHELM (Andrist, 2003) a sub- 
set of the REP model which is sufficient to provide the results of this paper. 

Ancillary weightings v m are restricted to 1 and 0. v m = 0 is, if relevant, 
applied to all moves to a state s rather than to specific moves: it can be used 
to exclude moves losing theoretical value, and/or to emulate a search horizon 
of H moves, within which a player will win or not lose if possible. 

Wilhelm offers five agents based on the REP model: these are, as de- 
fined below, the Player, Analyser, Predator, Emulator, and Predictor. A 
predefined number of games may be played between any two of WILHELM, 
Player, Predator, Emulator and an infallible player with endgame data. 
WILHELM also supports the creation of Markov matrices, see Section 5. 

3.1 The Player 

The Player is an REP R, of competence c, and therefore chooses its 
moves stochastically using a validated (pseudo-)random number generator in 
conjunction with the function S c (val, d) defined earlier. 

3.2 The Analyser 

Let us imagine that an unknown fallible opponent is actually going to 
play as an R c with probability p(d)-hx that c e (x, x + 8x): I p(x) dx = 1. 

The Analyser attempts to identify the actual, underlying c of the R c which 
it observes. For computational reasons, the Analyser must assume that c is a 
value from a finite set {cj} and that c = cj with initial probability pc 0 j. 

Here, the Cj are regularly spaced in [c min , c max ] as follows: 

Cmin = Cl, Cj = C] + (j-\)-cd and c max = c } + (n-l)-cS, i.e. c = c min (cS)c max . 

The notation c = c min (cS)c max is used to denote this set of possible values 
c. The initial probabilities pcoj may be 1 In, the usual ‘know nothing’ uniform 
distribution, or may be based on previous experience or hypothesis. They are 
modified, given a move to state s nex „ by Bayesian inference: 

Tjinext) = Prob[move to state s next \ c = cj\, and 

pci+ij = pc ^ Tjinext) / Zk [pc iX T k (next)]. 

Thus, the new Expectedfc] = Zj pc i+I fCj. 

In Subsection 4.1, we investigate what values should be chosen for the 
parameters c min , cS and c max so that the errors of discrete approximation are 
acceptably small. 
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3.3 The Predator 

On the basis of what the Predator has learned from the Analyser about its 
opponent, it chooses its move to best challenge the opponent, i.e., to opti- 
mise the expected value and depth of the position after a sequence of moves. 
As winning attacker, it seeks to minimise expected depth; as losing defender, 
it seeks to maximise expected depth. In a draw situation, it seeks to finesse a 
win. 

Different moves by the Predator create different sets of move-choices for 
the fallible opponent. These in turn lead to different expectations of theoreti- 
cal value and depth after the opponent’s moves. 

The predator implementation in WILHELM chooses its move on the basis 
of only a 2-ply search. It may be that deeper searches will be worthwhile, 
particularly in the draw situation. 

3.4 The Emulator 

The Emulator E c is conceived as a practice opponent with a ‘designer’ 
level of competence tailorable to the requirements of the practising player. 
An REP R c will exhibit an apparent competence c' varying, perhaps widely, 
above and below c because it chooses its moves stochastically. In contrast, 
the Emulator E c chooses a move which exhibits to an Analyser an apparent 
competence c” as close to c as possible. 

The reference Analyser is defined as initially assuming the Emulator is 
an R x , x = 0(l)2c, where x = Xj with initial probability l/(2c+l). 

The Emulator E c therefore opposes a practising player with a more con- 
sistent competence c than would R c , albeit with some loss of variety in its 
choice of moves. The value c can be chosen to provide a suitable challenge 
in the practice session. 

The practising player may also have their apparent competence assessed 
by the Analyser. 

3.5 The Predictor 

The Predictor is advised of the apparent competence c of the opponent. It 
then predicts how long it will take to win, or what its chances are of turning 
a draw into a win, using data from an Analyser and from a Markov model of 
the endgame. This model is defined in Section 5. 
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4. Robustness of the Model 

The two famous Browne-BELLE KQKR exhibition games have already 
been studied using the REP model (Haworth, 2003). Browne’s apparent 
competence c was assessed by an Analyser , and Belle’s moves as Black 
were compared with the decisions of a Predator using the Analyser's output. 
In that analysis, the following six choices were made: 

- c min = 0, cd = 1, c max = 50; all cj were deemed equally likely, 

- k = 0 + (i.e., arbitrarily small, effectively zero) and metric = DTC. 

The following question therefore arises: to what extent are the conclu- 
sions of the Analyser and the choices made by the Predator affected by these 
six choices? Our first studies addressed this question. 

4.1 The Effect of Numerical Approximation 

Browne-BELLE game 1 was first reanalysed, this time with k= 1 , and: 

Cmin = 0, c max = 50 and cd in turn set to 0.01, 0.1, 1, 2, 5 and 10. 

Figure 1 takes the Analyser with cd = 0.01 as a benchmark, and shows 
how the choice of cd affected the Analyser's inferences during play. 

It may be shown the Analyser's Bayesian calculation is a discrete ap- 
proximation to a calculable integral: the theory of integration therefore guar- 
antees that this calculation will converge as cd — > 0. We judge that the error 
is ignorable with cd = 1 and that no smaller cd is needed. 




Figure 1. Differences in c-estimation, relative to the cS= 0.01 estimate. 
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The analysis of the game was then repeated with: 

cS= 1, c min = 0 and c max in turn set to 100, 90, 80, 70, 60, 50, 40 and 30. 

Again, intuitively, we would expect the error introduced by a finite c max 
to reduce as c max -^° Figure 2 shows that this is indeed the case and that, 
with Browne’s apparent c ~ 20, c max - 50 is conservative enough. However, 
it may need to be larger for easier endgames. 

We assume that our opponent has positive apparent competence c and 
that the Analyser is correct in taking c min = 0 as a lower bound on c. 

4.2 The Effect of k 

Given the requirements on S c (val, d), it may be shown 5 that, as K in- 
creases, R c progressively loses its ability to differentiate between better and 
worse moves, that R c ’s expectation of state and theoretical value do not im- 
prove and that R c — > R 0 . Thus, for a given set of observations, if the Analyser 
assumes a greater k, it will infer an increasing apparent competence c. 

In this paper, we choose a fixed K= 1 throughout, as it were, recognising 
the next move in the line contemplated. We have not tested the effect of dif- 
ferent k on a Predator's choices of move, but assume it is not great. There 
seems little reason to choose one value of k over another. 

5 The proof is by elementary algebra and in the style of Theorem 3 (Haworth, 2003). 
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4.3 The Effect of the Initial Probability Assumption 

The usual, neutral, initial stance is a know nothing one, assuming that c is 
uniformly distributed in a conservatively- wide interval [c min , c mca \. However, 
it is clear that had Belle been using the REP model, it could have started 
game two with its perception of Browne as learned from game one, just as 
Browne started that game with his revised perception of KQKR. Also, one 
might have a perception of the competence c likely to be demonstrated by 
the opponent with the given endgame force - and choose this to be the mid- 
point of a [c min , c max ] range with a normal distribution. 

Bayesian theory, see Subsection 3.2, shows that the initial, assumed non- 
zero probabilities continue to appear explicitly in the calculation of subse- 
quent, inferred probabilities. We therefore conclude that initial probabilities 
have some effect on the inferred probabilities. 

4.4 The Effect of the Chosen Metric 

The metric Depth to Conversion (DTC) was chosen because conversion 
is an obvious intermediate goal in most positions. The adoption of DTC is 
however a chessic decision. 

Our analysis of the Browne-BELLE games shows that the Predator would 
never have made a DTC-suboptimal move-choice for Black. It is reasonable 
to assume that, had DTM(ate) been the chosen metric, it would never have 
chosen a DTM-suboptimal move. 

However, different metrics occasionally define different subsets of 
moves as metric-optimal. Where this occurs, the Predator might well choose 
a different move in its tracking of the Browne-BELLE games. 

5. A Markov Model of the Endgame 

Let us assume that the Preference Function S c (val , d) is fixed, e.g., as the 
function defined here with k = 1. 

Given a position P in state ,v„ we can calculate the probability of R c 
choosing move m to some position P' in state Sj. We may therefore calculate 
the probability, T c (j) of moving from position P to state Sj. Averaging this 
across the endgame over all such positions P in state we may derive the 
probability m uj of a state transition s, — > Sj assuming initial state 

The {mij} define a Markov matrix Me = for player R c . This matrix, 
and the predictions which may be derived from it, provide a characterisation 
of the endgame as a whole. 

Let us assume that the initial position is 1-0, in state s it and that R c does 
not concede the win. From the matrix, we may derive predictions such as: 




72 



G.AfC. Haworth, R.B. Andrist 



- the probability of R c winning on or before move m, 

- the expected number of moves required for R, to achieve the win. 

this is L, in the solution of (I - M c )-L = U 6 , q.v. (Haworth, 2003). 

These theoretical predictions have been computed and are compared with 
the results of the extensive experiment described in the next section. 



6. An Experiment with R 2 o 

Echoing Browne-BELLE, a model KQKR match was staged between the 
fallible attacker R 20 and the infallible defender R„ It was assumed that R 20 
would not concede the win but eventually secure it as theory predicts. The 
game-specific repetition and 50-move drawing rules were assumed not to be 
in force. Table 1 summarises the results of this experiment. 

1,000 games were played from each of the two maxDTC KQKR posi- 
tions used in the Browne-BELLE match. Games ended when conversion was 
achieved by White. The purpose of the experiment was to observe: 

- the distribution of the c inferred by an Analyser 7 at the end of each game 

with the assumed probability of c, set to 1/5 1 at start of each game, 

- the distribution of the lengths of the games, and 

- the trend in the Analyser ’ s inferred c, ignoring game-starts after the first. 



KQKR: R 20 - 

Min., end-of-game apparent c 
Max., end-of-game apparent c 
Mean, end-of-game apparent c 
St. Dev., end-of-game apparent c 
St. Dev of the Mean apparent c 
|Mean c - 20|/Stdev_mean 

Min. moves, m , to conversion 

Maximum moves, m 

Mean moves, m 

St. Dev., m 

St. Dev., mean of m 



Position 1 


Position 2 


Overall 


15.06 


14.73 


14.73 


35.66 


40.71 


40.71 


21.318 


21.620 


21.469 


3.345 


3.695 


3.524 


0.106 


0.117 


0.079 


12.43 


13.85 


18.59 


37 


37 


37 


395 


325 


395 


96.88 


94.31 


95.60 


102.951 


102.273 


102.587 


3.256 


3.234 


2.294 



Table 1. Statistical Analysis of the 2,000-game experiment. 



6 1 is the Identity matrix; U is a vector where each element is the unit 1 1 ’ . 

7 using c min = 0, cS = 1 and c ma - 50 as found adequate in Section 4.1. 
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6.1 R 2 o s Apparent c after One Game 

Figure 3 shows the distribution of the apparent c as inferred at the end of 
each, single game: the mean c is 21.50 ± O.O 8 8 . This rather surprised us, be- 
ing more distant from the actual c = 20 than expected. The reason is that the 
mean of {end-of-game estimated c} is not statistically the best way to esti- 
mate the underlying c, a task we revisit in Subsection 6.4. 




6.2 Game-Length Statistics 

Starting from the two positions with (maximum) DTC depth 31, and 
taken over the 2,000 games, the mean number of moves required for conver- 
sion is 95.60 ± 2.29. Figure 4 shows the distribution of the experiment’s 
game lengths in comparison with the predictions of the Markov model. 

Figure 5 shows the Markov-model predictions for the expected number 
of moves to conversion, for c = 20, 21, and 22 and starting at any depth. 
Note that it shows that the main barriers to progress seem to be between 
depths 17 and 26 rather than at the greatest depths. 

From depth 31, the moves predicted are 97.20 for c = 20, 83.70 for c = 21 
and 74.16 for c = 22. The experimental results are therefore in close agree- 
ment with these predictions, indicating a c of ~20.1. 



Mean end-of-game apparent c is still 21.04 when the Analyser’ c max is 30 rather than 50. 
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Figure 4. Distribution of game lengths in the R c -R^ KQKR match. 




6.3 The Probability of Winning 

The games were played without the 50-move rule but the Markov model 
allows us to calculate the probability of winning from depth d on or before 
move 50, before a possible draw claim by the opponent. It is the probability 
of being in state 0 after 50 moves, namely the element M c so [d, 0] of Me 50 . 
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Figure 6 gives these probabilities for c = 20, 21, and 22 and for all initial 
depths. For c = 20 and initial depth 31, this is 12.67%, a figure reached after 
55 moves in the 2,000 game experiment. 




Figure 6. Probability^ wins an R c -Ro * KQKR game in 50 moves]. 




Figure 7. Analyser error in c-estimate versus number of games analysed. 
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6.4 Analysing R c ’s competence c 

The mean of the 2,000 end-of-game apparent c values is not actually the 
best estimate of R c ’s underlying c. 

The reason is that the 2,000 games may be seen as some 191,200 inde- 
pendent move-choices by the R c . There is no need to associate the know 
nothing uniform distribution probabilities with the possible c, more than 
once. In fact, to do so is to interrupt the Bayesian inference processor of the 
Analyser and to negate what the Analyser has learned from previous games 
about the non-uniform distribution of probabilities of the candidate cj. 

Figure 7 shows the Analyser's, perception of c approaching the correct 
value of 20 as it works through the 2,000 games. Even starting with an esti- 
mate of c = 25, it is accurate to 0.1 after examining 6,000 moves. 

7. Apparent Competence of Players 

The apparent competence of both carbon and silicon players has been 
calculated for some published games. The initial assumptions differed 
slightly from those of Haworth (2003): here, WILHELM was set to analyse 
with candidate c = 0 (0.01) 50 and K = 1 and the results are listed in Table 2, 
showing depth conceded by both sides, net progress and apparent c. 

Some background to the games may help put the figures in context. The 
two Browne-BELLE games (Fenner, 1979; Haworth, 2003; Jansen, 1992a; 
Levy and Newborn, 1991) are the famous demonstration that the ‘easy’ 
KQKR endgame is not so easy to win. Gelfand-Svidler was a tie-breaker 
rapid-play game played under extreme time pressure. Pinter-Bronstein has 
been extensively analysed by Roycroft (1988). Timman consulted exten- 
sively in a prior adjournment (Breuker et al, 1992). Fritz (Heise, 2002) 
played itself with only 3-to-4-man EGTs in an Intel-AMD duel. Lengyel lost 
the draw three times before our analysis begins (Levy, 1972a, b, 1992). 



# Profile 


White 


Black 


Res. 


Year 


#m 


depth lost 
Wh. Bl. 


depth 

gain 


Final c 
Wh. Bl. 


1 KQKR 


Browne 


BELLE 1 


= 


1978 


45 


27 


0 


18 


19.5 


oo 


2 KQKR 


Browne 


BELLE 2 


1-0 


1978 


50 


19 


0 


31 


18.4 


oo 


3 KRKQ 


Gelfand 


Svidler 


= 


2001 


50 


37 


79 


8 


3.5 


4.1 


4 KNKBB 


Pinter 


Bronstein 


= 


1977 


50 


60 


95 


14 


5.9 


15.4 


5 KBBKN 


Popovich 


Korchnoi 


= 


1984 


31 


74 


45 


1 


8.1 


6.6 


6 KBBKN 


Timman 


Speelman 


1-0 


1992 


25 


36 


44 


33 


15.0 


7.6 


7 KNKBB 


Fritz 


Fritz 


= 


2002 


49 


169 


210 


8 


2.9 


3.2 


8 KQKQN 


Lengyel 


Levy 


0-1 


1972 


14 


7 


16 


5 


4.2 


2.8 



Table 2. Apparent Competence of Players. 
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Even after noting that the values c are not necessarily the player’s true 
equivalent c, and are meaningful only in relative rather than absolute terms, 
the performances of Browne and Timman stand out. Fritz trades major 
depth with its clone-opponent and clearly misses the withheld perfect infor- 
mation. The time constraints of Rapid Play, and even third-phase 30'/game 
play in classical chess, mitigate against quality endgame play - arguably a 
loss to the world of chess. 

8. Summary 

We have examined the utility of a reference model of Fallible Endgame 
Play by both experiment and theory, using both a comprehensive REP im- 
plementation in WILHELM and Markov methods. Various demonstrations 
have shown opportunities for exploiting the model, and the robustness of 
decisions based on it. Experimental results have also been compared with the 
Markov predictions, with which they agree closely. 

Experiments which remain to be carried out include: 

- infallible White attacking fallible Black in a drawn position 

e.g., in KBBKN, KNNKP, KNPKN, KQNKQ, KQPKP, or KRBKR, 

- infallible Black pressing for a draw in a lost position 

this requires additional EGT data on draws forced in d moves, 

- a more insightful Predator searching more than 2 p plies ahead, and 

- use of the Emulator as a training partner for human players. 

The REP model may be extended to other games where EGTs may be 
computed - to convergent games such as Chinese Chess, 8x8 checkers, In- 
ternational Draughts, and in principle if not in practice, to divergent place- 
ment games such as Hex and Othello. 

If a search method can propose what it considers the best few moves in a 
position, each evaluated on an identical basis and therefore comparable, the 
concept of a stochastic player may be applied more generally than to just 
endgames for which perfect information is available. 
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Appendix A: Acronyms, Notation and Terms 



Analyser 

c 

cS 

Cmax 

Cmin 

d 

DTC 

DTM 

Emulator 

Horizon 

K 

X 

Li 

maxDTC 

M c 

m ij 

metric 

n 

n i 

n 2 

n B 

n w 

ns 

p(x)'Sx 



an agent identifying a fallible opponent as an R c player 
the competence index of an REP 

the difference between adjacent q assumed by the Analyser 

the maximum c assumed possible by the Analyser 

the minimum c assumed possible by the Analyser 

the depth (of win or loss) of a position in the chosen metric, e.g. DTC 

Depth to Conversion, i.e. to change of material and/or mate 

Depth to Mate 

an agent, E c , choosing moves to best exhibit apparent competence c 
a search limit, within which R c will win or not lose if possible 
k > 0 ensures that (d + k)’ c is finite 

a scaling factor, matching the probability of loss to that of a draw 
expected length of win (to conversion in winner’s moves) from depth i 
maximum DTC (depths) 
a Markov matrix [my] 

the probability, averaged over the endgame, that R c in state q moves to sj 

a measure of the depth of a position, usually in winner’s moves 

the number of different q assumed by an Analyser 

nj>n w , ensures that draws are less preferable than wins 

n 2 > n B , ensures that draws are more preferable than losses 

the number of ‘Black win’ states 

the number of ‘White win’ states 

the number of states for a chosen endgame and depth metric 
the probability that R c ’s c e [x, x + Sx\ 
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P C 0J 

P c ti 

Player 

Predator 

Predictor 

REP 

Ro 

R c 

Roo 

s 

s i 

Sc(yal s , d s ) 
S c (val, d) 
S c {s) 

m 

val 

V m 



the a priori (before a move) probability that the unknown c is Cj 

the probability, inferred after the ith move, that the unknown c is Cj 

an R c , choosing its moves stochastically with Preference Function S c 

an agent, choosing the best move possible on the basis of an opponent-model 

an agent predicting the longer term prospects of a result from Markov theory 

Reference Endgame Player 

the REP which prefers no move to any other 

an REP of competence c 

the player which plays metric-optimal moves infallibly 
endgame state 
(endgame) state i 

the Preference Function for REP R c , a function of destination value and depth 

a convenient contraction of S c (val s , d s ) 

a more convenient contraction of S c (val s , d s ) 

the probability that R c moves to state /, s t 

the theoretical value of a position, i.e., win, draw or loss 

a weighting that may be given to a move on chessic grounds 



Appendix B: Preference Functions 

We require that the set {R c } is in fact a linear, ordered spectrum of R c players such that: 

- for R 0 , all moves are equally likely, 

- ‘ RJ = lim c _>oo R c exists and is the infallible player choosing metric-optimal moves, 

- ‘R_J = lim^.ooRc exists and is the anti-infallible player choosing anti-optimal moves, 

- c2 > cl R C 2 s expectations of successor state, i.e. E[>], are no worse than R cl ’ s, 

- c2> cl => R c2 s expectations of theoretical value, i.e. E[val s ] , are no worse than R cl ' s. 

The following requirements on S c {val , d) = S c (s) are natural ones and sufficient to ensure 
the above, as proved in Haworth (2003): 

- S c (s) is finite and positive: no move has zero or infinite preference for finite c, 9 

- S 0 (s ) is a constant, 

- for some nj>n w and n 2 > n B , S c (draw) = S c (win , nj) = S c (loss , n 2 ), 

- Fj(c) = S c (s i+ j)/ S c (Si ) decreases as c increases: lim Ffc) = 0 and lim^.^ VFfc) - 0, 

- for c ^ 0, sign (c)-S c (Sj) decreases (>l) as j increases (T), 

- for c > (<) 0, W c (d) = S c (win , d)l S c (win , d+ 1) l (t) as d t and lim^ W c (d) = 1, 

- for c > (<) 0, L c (d) = S c (loss, d+ 1)/ S c (loss, d) 4/ (T) as d T and lim^oo L c (d) = 1. 

The net effect is that: 

- the spectrum of R c is centred as required on the random player, R 0 , 

- the R c with c > 0 prefer better moves to worse moves, 

- the R c demonstrate increasing apparent skill as c — » °o, 

- R c can be arbitrarily close to being the metric-infallible player for finite c 

- as R c discriminates less between a win (or loss) of depth d and one of depth d+ 1. 



9 Hence the requirement that k > 0, to accommodate the case of d = 0 in (d + k)' c . 
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Abstract While Nalimov’s endgame tables for Western Chess are the most used today, 
their Depth-to-Mate metric is not the only one and not the most effective in 
use. The authors have developed and used new programs to create tables to al- 
ternative metrics and recommend better strategies for endgame play. 

Keywords: chess, conversion, data, depth, endgame, goal, move count, statistics, strategy 



1. Introduction 

Chess endgames tables (EGTs) to the ‘DTM’ Depth to Mate metric are 
the most commonly used, thanks to codes and production work by Nalimov 
(Nalimov, Haworth, and Heinz, 2000a, b; Hyatt, 2000). DTM data is of inter- 
est in itself, even if conversion, i.e., change of force, is usually adopted as an 
interim objective in human play. However, more effective endgame strate- 
gies using different metrics can be adopted, particularly by computers 
(Haworth, 2000, 2001). A further practical disadvantage of the DTM EGTs 
is that, with more men, DTM increases and file-compression becomes less 
effective. 

Here, we focus on metrics DTC, DTZ 1 and DTZ 50 2 ; the first two were 
previously used by Thompson (1986, 2000) and Wirth (1999). New pro- 
grams by Tamplin (2001) and Bourzutschky (2003) have enabled a complete 
suite of 3-to-5-man DTC/Z/Z 50 EGTs to be produced. 

Section 2 outlines these two new algorithms. Sections 3 to 5 review the 
new DTC, DTZ and DTZ50 data tabled in the Appendix. Finally, improved 
endgame strategies are recommended for the 50-move context. 



1 DTC = Depth to Conversion, i.e., to force change and/or mate. 

DTZ = Depth to (Move-Count) Zeroing (Move), i.e., to P-push, force change and/or mate. 

2 DTZj = DTZ, but draw if the ‘win’ can be pre-empted by a /.'-move draw claim. 
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2. New Approaches to EGT Generation 

Below we briefly describe two new approaches to EGT generation. The 
first one is described adequately in the literature; the second so far not. 

2.1 Tamplin’s Wu-Beal Code 

Tamplin (2001) combined the Wu-Beal (2001a, b) algorithm with 
Nalimov indexing in a new code whose objectives were primarily Nalimov- 
compatibility, simplicity, maintainability and portability. Most pawnless 3- 
to-5-man DTC EGTs were generated, the new code including an inverse- 
index function mirroring Nalimov’ s index function. 

2.2 Bourzutschky’s Modified-Nalimov Code 

Bourzutschky (2003) modified Nalimov’ s DTM-code to enable it also to 
generate EGTs to metrics DTC* and DTZ*. This involved generalising some 
DTM-specific aspects of the algorithm, as well as the obvious changes to the 
iterative formula for deriving depth. For DTC, the code retains the efficien- 
cies of the DTM-code while requiring maxDTC rather than maxDTM cycles. 
Because EGT generation to the DTZ metric has not yet been implemented 
generically as a sequence of sub-EGT generations, each based on a fixed 
pawn structure, this is not the case for DTZ* computations. These can also 
require somewhat more than DTC cycles but the difference is insignificant. 

3. The DTC Data 

DTC EGTs are interesting, not only for completeness, but because con- 
version is an intuitively obvious objective and the DTC EGTs document 
precisely the phase of play when the material nominated is on the board. 

The remaining 3-to-5-man DTC EGTs were generated. Table 1 in the 
Appendix lists for each endgame the number of positions of maxDTC, 
wtm/btm and 1 -0/0-1. The ICGA (2003) website provides further data, in- 
cluding %-wins, illustrative maxDTC positions and DTC-minimaxing lines. 
Because there are many wins in 1, the % of positions won does not character- 
ise well the presence of wins in an endgame. Similarly, maxDTC is not a 
good indicator. We therefore suggest a new characteristic, 

Win-Presence = %_of_positions_won x (Average DTC of Win) 

This is not unduly affected by the usual peak of wins in 1 or by the long 
tail of deep wins, and is in fact related to the number of moves for which a 
win is present on the board. 
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3.1 A Review of the DTC Data 

A first housekeeping point to be made is that this data often differs from 
Wirth’s data (Wirth and Nievergelt, 1999; Tamplin, 2003). The explanation 
is simple. First, Wirth has exactly one representative of each equivalence 
class of positions, including the harder case of both Kings being on al-h8. 
Nalimov would count {wKc3Qb3(c2)/bKal} as two positions rather than 
one. 

Second, Wirth’s code, based on the inherited RetroEngine, assumes 
that all conversions are effected by the winner. This is not so: the loser is 
sometimes forced to convert to loss, e.g., {wKelQblRfl/bKal}, in which 
case Wirth’s depth is too great by one. 

Tamplin’ s (2003) and Bourzutschky’s (2003) codes both measure depth 
consistently in winner’s moves. Also, they do not allow ‘realistic’ but 
voluntary conversions, e.g., {wKelQflRbl/bKal}, by the loser, a feature of 
Thompson’s original DTC EGT code (Thompson, 1986) which chose to 
move to the position with greatest DTC even if a capture was involved. 

The sub-6-man compressed DTC EGTs are 62.1% the size of the DTM 
EGTs, usefully saving 2.8GB disc space. 

The maxDTC=114 wins in KNNKP and KQPKQ are already known. 
KBNK wtm scores the highest in Win-Presence terms: maxDTC = 33, aver- 
age DTC = 24.68 and 99.51% of positions are 1-0 wins. 

4. The DTZ Data 

The DTZ metric is necessary if the length of the current phase of play is 
to be guarded in the context of chess’ k-move rule, k currently being 50. It 
was used pragmatically by Thompson (1986) to compute the KQPKQ and 
KRPKR EGTs when RAM was relatively scarce. 

Bourzutschky (2003) generated some DTZ EGTs where maxDTZ > 50 
and Tamplin (2003) completed the sub-6-man DTZ EGT suite. The 
computation continues to be a major feat as it cannot currently use 
Nalimov’ s bitvector-based algorithm which reduces RAM requirements by a 
factor of 4 to 16. 

Table 2 in the Appendix lists the results which differ from the DTC data. 
KNNKP with maxDTZ = 82 features the deepest endings. DTZ EGTs are 
commendably compact relative to DTM and DTC EGTs. The KPPPK wtm 
DTZ EGT is an extreme example, being only 2% the size of the DTM EGT. 
In total, the sub-6-man compressed DTZ EGTs are 52.9% the size of the 
DTM EGTs, usefully saving some 3.5GB of disc space. 
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5. The DTZ 50 Data 

Bourzutschky (2003) and Tamplin (2003) also generated DTZ 50 EGTs, 
not only for those cases where maxDTZ > 50, but for endgames directly or 
indirectly dependent on these as illustrated in Figure 1. The DTZ 50 metric 
rates as wins only those positions winnable against best play given the 50- 
move rule. In Figure 1, endgames for which EZ and EZ 50 are potentially but 
not actually different are in brackets, and dotted lines indicate that no 50- 
move impact emanates from or feeds back to them. 

The sub-6-man compressed DTZ 50 EGTs are 49.8% the size of the DTM 
EGTs. Table 3 in the Appendix lists 3-to-5-man DTZ 50 EGT data for end- 
games where DTZ 50 ^ DTZ and Table 7 gives examples of positions 
affected. Table 6 summarises 50-move impact, minimal for KNPKQ, con- 
siderable for KBBKN and KNNKP. 



PP-P 



(bp-p) (np-p) (pp-b) (pp-n) PP-Q QP-P RP-P 

BB-P BN-P BP-N (bp-q) (bp-r) NN-P NP-N NP-Q QP-jQ QR-P(rb-p) RP-B RP-Q (rp-r) 



, ^ ^ \ :\^ 

BB-N BB-Q BN-N NN-Q QR-Q RB-R 

Figure 1. Endgames with EZ 50 ^ EZ. 



If KwKb is an endgame with wtm and btm 1-0 wins impacted by the 50- 
move rule, KwxKb and KwKby are also impacted by the rule. This observa- 
tion, coupled with Thompson’s DTC results (Tamplin and Haworth, 2001) 
and the DTM results of Nalimov (Hyatt, 2000) and Bourzutschky (2003) 
indicate that many 6-man endgames are affected. Tamplin (2003) has com- 
puted some of these 6-man endgames’ EGTs to the DTZ and DTZ 50 metrics. 

In contrast with KNNKP, KBBKNN has the majority of its wins frus- 
trated, and few wins can be retained by deeper strategy in the current phase. 
There are significant percentages of frustrated 0-1 wins in KBBBKQ, and of 
delayed 1 -0 wins in KBBBKN and KBBNKN. 

Elsewhere, there is only the merest hint of the 50-move impact that might 
follow and we would expect that hint to become fainter as the number of 
men increases. 
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6. Endgame Strategies 

Let dtx be the depth by, and Ex an EGT to, the metric DTx. Let Sx' be an 
endgame strategy minimising dtx, e.g., SZ', or SZ 50 ', and let Sx + be a strategy 
maximising dtx. Further, let SZ° be an endgame strategy guarding the length 
of the current phase in the context of a £-move rule and a remaining mleft 
moves before a possible draw claim. By definition, if dtx > mleft, Sx° = Sx\ 
Let S S1S2S3 be an endgame strategy using strategies Ss h S s 2 , and Ss 3 in 
turn to subset the choice of moves, e.g., SZ o Z 50 'M'Z' which safeguards cur- 
rent phase length and 50-move wins, and then minimises dtm and dtz in turn. 

As conjectured by Haworth (2000), KQPKQ and KBBKNN provide po- 
sitions where all combinations of SC', SM' and SZ' fail to safeguard a win 
available under the 50-move rule: the examples here were found by Bourzu- 
tschky (2003). Similar positions for other endgames were found by Tamplin 
(2003). Some strategy-driven lines are listed in Appendix 1 after Table 5. 

6.1 New Endgame Strategies 

SZ 50 ' wins any game winnable against best play under the 50-move 
drawing rule. Here, we suggest ways to finesse wins against fallible 
opposition. If the current phase of play is not unavoidably overlong, strategy 
SZ°Z 5 o'Z", effectively SZ°Z' = SZ', completes it without a draw claim. 

For positions where DTZ 50 indicates draw, the table EZ 50 can be supple- 
mented by the position’s DTR 3 value. Let this hybrid table be Etf 0 , 
implicitly defining metric DTH 50 . Note that EZ 50 is visible within EH 50 . 
Since the intention is to use EH 50 only in conjunction with EZ, let the table 
E8(H/Z) 5 oZ = { 8 (DT(H/Z) 50 , DTZ)}, giving a compact encoding 4 of E(H/Z) 5 0 
decodable with the use of EZ. With E 8 Z 50 Z = O if EZ 50 = EZ, sub- 6 -man 
compressed E 8 Z 50 Z EGTs are only 0.7% the size of the DTM EGTs. 

The strategy SZ o H 50 ' guards the length of the current phase, wins all 
games which are wins under the 50-move rule, and minimaxes DTR, but 
only tactically, when the 50-move rule intervenes. 

In position NN-P3, SZ°H 50 ' makes the optimal move-choice 5 . In contrast, 
SZ°Z 5 o‘ can, and So (o = C, M‘, Z\ Z°Z 50 'Z') does, concede DTR depth. 
However, SZ°H 50 ' has two flaws, the first being a major one. It can draw by 
repeating positions, e.g., position NN-P4 6 . SZ o H 50 ' should therefore be aug- 
mented by as deep and perceptive a forward search as possible, denoted here 
by * as in SZ°H 50 "*. 

3 DTR = Depth by The Rule (Haworth, 2000, 2001), i.e. the minimum k s.t. DTZ* is a win. 

4 We chose 0 = “EZ code = Ex k code”, 1 3 “new EZ 50 draw”, 5+1 = “0 < DTx - DTZ = 5”. 

5 SZ°H 5 o' - SH 50 + : 1. Nbl+' Ka4'. White retains DTR=51 and converts in 31 moves. 

6 NN-P4, SZ°H 50 ' - SH 50 + : 1. Nd5+? Kc4' 2. Ndc3 Kb4’ {NN-P4 repeated). 
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If position NN-P4, with dtzsi = 25, has just 25 moves left in the phase, it 
also shows SZ°H 50 ' failing to achieve minimal DTR. The move Nd5+ is op- 
timal for SH 5 o‘ but DTZ 51 -suboptimal, a fact not visible in the EGT EH 50 . 
After Nd5+, SZ° limits the move choice and puts a DTR of 5 1 out of reach. 
Again, forward search helps, this time aiming to control DTR. 

Any strategy can be sharpened by the opponent sensitivity of an adaptive, 
opponent model (Haworth, 2003; Haworth and Andrist, 2003). 

7. EGT Integrity 

All EGT files were given md5sum signatures to guard against subsequent 
corruption. The EGTs were checked for errors in various ways. 

- DTx EGTs {Ex},x = C,Z and Z 50 , verified by Nalimov’s standard test. 

- consistency of the (E(C/M/Z)} EGTs confirmed 

theoretical values found identical with dtm > dtc > dtz. 

- DTC EGT statistics were also found compatible with those of Wirth. 

- consistency of the {EZ 50 } and (EZ) EGTs confirmed 

linear checks confirm EZ 50 = EZ except for known subset, 
values identical with dtzso ^ dtz, or ‘EZ’ win/loss an ‘EZ 50 ’ draw. 

8. Summary 

This paper records the separate initiatives of Tamplin (2003) and 
Bourzutschky (2003) in creating new codes capable of generating non-DTM 
EGTs. It also reviews the new DTC/Z/Z 50 data produced by the combination 
of these codes. The DTC, DTZ and DTZ 50 EGTs (EC, EZ and EZ 50 ) are 
increasingly compact compared to the DTM EGTs, an incidental but 
practical benefit with 3-to-6-man DTM EGTs estimated to be 1 to 2 TB in 
size. 

Together, the sub-6-man compressed EZ and E8Z50Z EGTs are 53.6% 
the size of the EM EGTs. To date, the equivalent 6-man EGTs are 63.8% the 
size of their EM EGT counterparts but these do not yet involve Pawns 

Although the computation of DTR data remains a future challenge, table 
EZ 50 may in principle be augmented by DTR values where dtr > 50 to give 
table EH 50 . This table may be used to minimise dtzso when dtzso ^ 50, and to 
minimax dtr with the assistance of forward-search when dtr > 50. 

Clearly, there are more effective and efficient endgame strategies than 
the commonly used SM". It is recommended that SZ°M\ SZ°Z 50 "Z' ( * ) , SZ°Z 50 ' 
Z'H5o‘ ( * ) , SZ°H 5 o * and perhaps other strategies are considered, and that the 
EZ, E8Z50Z and E 5 H 50 Z EGTs are made available to enable their use. 
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Appendix: Chess Endgame Data and Examples 



DTC Metric 





Endgame 






1 # of maximal positions 


1 max depths, moves 


Name 


GBR 


# 


w-b 


wtm 


1-0 

btm 


wtm 


0-1 

btm 


1-0 

wtm btm 


0-1 

wtm btm 


KBK 


0010.00 


3 


2-1 


0 


0 


0 


0 


— 





— 


— 


KNK 


0001.00 


3 


2-1 


0 


0 


0 


0 


— 


— 


— 


— 


KPK 


0000.10 


3 


2-1 


3 


2 


0 


0 


19 


19 


— 


— 


KQK 


1000.00 


3 


2-1 


1 


8 


0 


0 


10 


10 


— 


— 


KRK 


0100.00 


3 


2-1 


139 


433 


0 


. 0 


16 


16 


— 


— 


KBKB 


0040.00 


4 


2-2 


52 


14 


14 


52 


1 


0 


0 


1 


KBKN 


0013.00 


4 


2-2 


2 


1 


1 


5 


1 


0 


0 


1 


KBKP 


0010.01 


4 


2-2 


104 


28 


6 


14 


1 


0 


5 


6 


KNKN 


0004.00 


4 


2-2 


5 


1 


1 


5 


1 


0 


0 


1 


KNKP 


0001.01 


4 


2-2 


29 


7 


3 


3 


7 


6 


12 


13 


KPKP 


0000.11 


4 


2-2 


1 


1 


1 


1 


14 


14 


14 


14 


KQKB 


1030.00 


4 


2-2 


980 


4,837 


0 


0 


12 


12 


— 


— 


KQKN 


1003.00 


4 


2-2 


5 


19 


0 


0 


19 


19 


— 


— 


KQKP 


1000.01 


4 


2-2 


1 


1 


20 


20 


26 


26 


1 


2 


KQKQ 


4000.00 


4 


2-2 


5 


3 


3 


5 


10 


9 


9 


10 


KQKR 


1300.00 


4 


2-2 


2 


11 


55 


291 


31 


31 


2 


3 


KRKB 


0130.00 


4 


2-2 


29 


1 


0 


0 


18 


18 


— 


— 


KRKN 


0103.00 


4 


2-2 


2 


2 


1 


4 


27 


27 


0 


1 


KRKP 


0100.01 


4 


2-2 


28 


42 


3 


3 


16 


16 


10 


11 


KRKR 


0400.00 


4 


2-2 | 


59 


111 


111 


59 


4 


3 


3 


4 


KBBK 


0020.00 


4 


3-1 


16 


59 


0 


0 


19 


19 


— 


— 


KBNK 


0011.00 


4 


3-1 


144 


436 


0 


0 


33 


33 


— 


— 


KBPK 


0010.10 


4 


3-1 


2 


8 


0 


0 


21 


21 


— 


— 


KNNK 


0002.00 


4 


3-1 


77 


15 


0 


0 


1 


0 


— 


— 


KNPK 


0001.10 


4 


3-1 


24 


32 


0 


0 


22 


22 


— 


— 


KPPK 


0000.20 


4 


3-1 


62 


21 


0 


0 


16 


16 


— 


— 


KQBK 


1010.00 


4 


3-1 


2,411 


14,012 


0 


0 


6 


6 


— 


— 


KQNK 


1001.00 


4 


3-1 


4,932 


23,203 


0 


0 


7 


7 


— 


— 


KQPK 


1000.10 


4 


3-1 


75 


175 


0 


0 


7 


7 


— 


— 


KQQK 


2000.00 


4 


3-1 


3,280 


13,005 


0 


0 


3 


3 


— 


— 


KQRK 


1100.00 


4 


3-1 


44 


158 


0 


0 


1 4 


4 


— 


— 


KRBK 


0110.00 


4 


3-1 


1 


6 


0 


0 


12 


12 


— 


— 


KRNK 


0101.00 


4 


3-1 


324 


1,017 


0 


0 


12 


12 


— 


— 


KRPK 


0100.10 


4 


3-1 


376 


1,885 


0 


0 


8 


8 


— 


— 


KRRK 


0200.00 


4 


3-1 


68 


287 


0 


0 


5 


5 


— 


— 


KBBKB 


0050.00 


5 


3-2 


503 


6 


141 


546 


6 


6 


1 


2 


KBBKN 


0023.00 


5 


3-2 


34 


53 


44 


222 


66 


66 


0 


1 


KBBKP 


0020.01 


5 


3-2 


34 


69 


5 


11 


21 


21 


8 


9 


KBBKQ 


3020.00 


5 


3-2 


248 


58 


74 


15 


4 


3 


71 


71 


KBBKR 


0320.00 


5 


3-2 


26 


7 


2 


6 


7 


6 


8 


9 


KBNKB 


0041.00 


5 


3-2 


28 


19 


133 


514 


13 


12 


1 


2 


KBNKN 


0014.00 


5 


3-2 


2 


1 


104 


533 


77 


76 


0 


1 


KBNKP 


0011.01 


5 


3-2 


1 


2 


523 


535 


26 


26 


8 


9 


KBNKQ 


3011.00 


5 


3-2 


79 


1 


22 


4 


5 


5 


42 


42 


KBNKR 


0311.00 


5 


3-2 


127 


23 


4 


2 


6 


5 


12 


13 


KBPKB 


0040.10 


5 


3-2 


14 


14 


508 


1,524 


40 


39 


2 


3 


KBPKN 


0013.10 


5 


3-2 


16 


6 


23 


86 


42 


42 


3 


4 


KBPKP 


0010.11 


5 


3-2 


92 


52 


27 


23 


53 


53 


6 


7 


KBPKQ 


3010.10 


5 


3-2 


30 


30 


3 


2 


4 


3 


42 


42 


KBPKR 


0310.10 


5 


3-2 


76 


53 


5 


6 


13 


12 


20 


21 



Table la. Chess Endgames: 3-to-5-man DTC data. 
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DTC Metric 





Endgame 






1 # of maximal positions 


I max depths, moves 


Name 


GBR 


# 


w-b 


wtm 


1-0 

btm 


wtm 


0-1 

btm 


1-0 

wtm btm 


0-1 

wtm btm 


KNNKB 


0032.00 


5 


3-2 


251 


82 


51 


109 


4 


3 


0 


1 


KNNKN 


0005.00 


5 


3-2 


38 


18 


56 


293 


7 


6 


0 


1 


KNNKP 


0002.01 


5 


3-2 


2 


4 


1 


1 


114 


113 


12 


13 


KNNKQ 


3002.00 


5 


3-2 


2,387 


465 


10 


2 


1 


0 


63 


63 


KNNKR 


0302.00 


5 


3-2 


2 


1 


6 


11 


3 


2 


10 


11 


KNPKB 


0031.10 


5 


3-2 


11 


3 


5 


18 


31 


30 


8 


9 


KNPKN 


0004.10 


5 


3-2 


9 


2 


27 


132 


48 


48 


3 


4 


KNPKP 


0001.11 


5 


3-2 


1 


6 


6 


9 


33 


33 


13 


14 


KNPKQ 


3001.10 


5 


3-2 


2 


2 


1 


1 


5 


4 


43 


43 


KNPKR 


0301.10 


5 


3-2 


8 


36 


7 


1 


18 


18 


42 


43 


KPPKB 


0030.20 


5 


3-2 


31 


34 


14 


34 


18 


17 


3 


4 


KPPKN 


0003.20 


5 


3-2 


3 


5 


21 


12 


30 


29 


13 


14 


KPPKP 


0000.21 


5 


3-2 


2 


11 


66 


58 


28 


28 


11 


12 


KPPKQ 


3000.20 


5 


3-2 


14 


15 


19 


8 


6 


5 


30 


30 


KPPKR 


0300.20 


5 


3-2 


1 


1 


2 


3 


25 


24 


25 


25 


KQBKB 


1040.00 


5 


3-2 


220 


998 


187 


645 


8 


8 


1 


2 


KQBKN 


1013.00 


5 


3-2 


74 


343 


30 


153 


7 


7 


0 


1 


KQBKP 


1010.01 


5 


3-2 | 


5 


19 


791 


789 


11 


11 


1 


2 


KQBKQ 


4010.00 


5 


3-2 


33 


1 


1 


1 


30 


30 


16 


17 


KQBKR 


1310.00 


5 


3-2 


1 


6 


8,848 


52,298 


19 


19 


1 


2 


KQNKB 


1031.00 


5 


3-2 


50 


158 


28 


64 


9 


9 


0 


1 


KQNKN 


1004.00 


5 


3-2 


7 


39 


31 


166 


9 


9 


0 


1 


KQNKP 


1001.01 


5 


3-2 


7 


8 


928 


911 


17 


17 


1 


2 


KQNKQ 


4001.00 


5 


3-2 


7 


1 


1 


4 


35 


35 


13 


14 


KQNKR 


1301.00 


5 


3-2 


1 


6 


15 


86 


22 


22 


2 


3 


KQPKB 


1030.10 


5 


3-2 


1,122 


4,328 


374 


1,290 


9 


9 


1 


2 


KQPKN 


1003.10 


5 


3-2 


1 


6 


3 


9 


10 


10 


1 


2 


KQPKP 


1000.11 


5 


3-2 


11,817 


39,633 


16 


16 


6 


6 


2 


3 


KQPKQ 


4000.10 


5 


3-2 


5 


13 


2 


4 


114 


113 


15 


16 


KQPKR 


1300.10 


5 


3-2 


4 


20 


5,177 


26,128 


20 


20 


2 


3 


KQQKB 


2030.00 


5 


3-2 


4 


15 


0 


0 


4 


4 


— 


— 


KQQKN 


2003.00 


5 


3-2 


287 


1,411 


0 


0 


4 


4 


— 


— 


KQQKP 


2000.01 


5 


3-2 


18,995 


19,257 


140 


140 


3 


3 


1 


2 


KQQKQ 


5000.00 


5 


3-2 


2 


21 


31 


152 


25 


25 


6 


7 


KQQKR 


2300.00 


5 


3-2 


2 


12 


2,383 


16,681 


14 


14 


1 


2 


KQRKB 


1130.00 


5 


3-2 


720 


2,556 


0 


0 


5 


5 


— 


— 


KQRKN 


1103.00 


5 


3-2 


| 234 


1,149 


36 


149 


5 


5 


0 


1 


KQRKP 


1100.01 


5 


3-2 


! 104,508 


131,846 


683 


683 


3 


3 


1 


2 


KQRKQ 


4100.00 


5 


3-2 


3 


31 


1 


2 


60 


60 


8 


9 


KQRKR 


1400.00 


5 


3-2 


10 


54 


8,099 


56,501 


15 


15 


1 


2 


KRBKB 


0140.00 


5 


3-2 


35 


46 


251 


951 


25 


25 


1 


2 


KRBKN 


0113.00 


5 


3-2 


9 


35 


106 


481 


21 


21 


0 


1 


KRBKP 


0110.01 


5 


3-2 


2 


12 


4 


12 


11 


11 


4 


5 


KRBKQ 


3110.00 


5 


3-2 


1 


3 


5 


4 


7 


6 


41 


42 


KRBKR 


0410.00 


5 


3-2 


28 


19 


3 


14 


59 


58 


3 


4 


KRNKB 


0131.00 


5 


3-2 


3 


6 


41 


89 


25 


25 


0 


1 


KRNKN 


0104.00 


5 


3-2 


5 


18 


101 


468 


24 


24 


0 


1 


KRNKP 


0101.01 


5 


3-2 


65 


81 


2 


2 


15 


15 


10 


11 


KRNKQ 


3101.00 


5 


3-2 


24 


5 


7 


3 


9 


8 


46 


46 


KRNKR 


0401.00 


5 


3-2 


1 


1 


1 


3 


33 


32 


4 


5 
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Name 

KRPKB 

KRPKN 

KRPKP 

KRPKQ 

KRPKR 

KRRKB 

KRRKN 

KRRKP 

KRRKQ 

KRRKR 

KBBBK 

KBBNK 

KBBPK 

KBNNK 

KBNPK 

KBPPK 

KNNNK 

KNNPK 

KNPPK 

KPPPK 

KQBBK 

KQBNK 

KQBPK 

KQNNK 

KQNPK 

KQPPK 

KQQBK 

KQQNK 

KQQPK 

KQQQK 

KQQRK 

KQRBK 

KQRNK 

KQRPK 

KQRRK 

KRBBK 

KRBNK 

KRBPK 

KRNNK 

KRNPK 

KRPPK 

KRRBK 

KRRNK 

KRRPK 

KRRRK 



DTC Metric 



Endgame 






1 # of maximal positions 


max depths, moves 


GBR 


# 


w-b 


wtm 


1-0 

btm 


wtm 


0-1 

btm 


1-0 

wtm btm 


0-1 

wtm btm 


0130.10 


5 


3-2 


11 


26 


502 


1,672 


62 


62 


1 


2 


0103.10 


5 


3-2 


2 


7 


4 


12 


46 


46 


1 


2 


0100.11 


5 


3-2 


184 


474 


17 


17 


9 


9 


10 


11 


3100.10 


5 


3-2 


5 


5 


5 


1 


9 


8 


78 


79 


0400.10 


5 


3-2 


33 


4 


23 


80 


60 


60 


6 


7 


0230.00 


5 


3-2 


1 


4 


0 


0 


10 


10 


— 


— 


0203.00 


5 


3-2 


215 


687 


45 


184 


7 


7 


0 


1 


0200.01 


5 


3-2 


16 


48 


988 


988 


9 


9 


1 


2 


3200.00 


5 


3-2 


14 


4 


2 


3 


15 


14 


20 


20 


0500.00 


5 


3-2 


3 


15 


6,210 


43,225 


25 


25 


1 


2 


0090.00/30 


5 


4-1 


116 


345 


0 


0 


10 


10 


— 


— 


0021.00 


5 


4-1 


783 


2,066 


0 


0 


13 


13 


— 


— 


0020.10 


5 


4-1 


3 


2 


0 


0 


16 


16 


— 


— 


0012.00 


5 


4-1 


22 


59 


0 


0 


13 


13 


— 


— 


0011.10 


5 


4-1 


9 


45 


0 


o 


10 


10 


— 


— 


0010.20 


5 


4-1 


56 


46 


0 


o 


16 


16 


— 


— 


0009.00/30 


5 


4-1 


44 


180 


0 


o 


21 


21 


— 


— 


0002.10 


5 


4-1 


194 


296 


0 


0 


15 


15 


— 


— 


0001.20 


5 


4-1 


2 


5 


0 


0 


12 


12 


— 


— 


0000.30 


5 


4-1 


11 


35 


0 


0 


11 


11 


— 


— 


1020.00 


5 


4-1 


182 


673 


0 


0 


6 


6 


— 


— 


1011.00 


5 


4-1 


54,680 


236,453 


0 


0 


4 


4 


— 


— 


1010.10 


5 


4-1 


68 


255 


0 


0 


6 


6 


— 


— 


1002.00 


5 


4-1 


182 


673 


0 


0 


7 


7 


— 


— 


1001.10 


5 


4-1 


11,789 


56,328 


0 


0 


5 


5 


— 


— 


1000.20 


5 


4-1 


1,264 


4,476 


0 


0 


6 


6 


— 


— 


2010.00 


5 


4-1 


96,576 


412,131 


0 


0 


3 


3 


— 


— 


2001.00 


5 


4-1 


13 


58 


0 


0 


4 


4 


— 


— 


2000.10 


5 


4-1 


138 


732 


0 


0 


4 


4 


— 


— 


9000.00/30 


5 


4-1 


! 1,513 


6,553 


0 


0 


3 


3 


— 


— 


2100.00 


5 


4-1 


! 56,174 


218,959 


0 


0 


! 3 


3 


— 


— 


1110.00 


5 


4-1 


1,198 


5,865 


0 


0 


1 4 


4 


— 


— 


1101.00 


5 


4-1 


7,474 


31,526 


0 


0 


! 4 


4 


— 


— 


1100.10 


5 


4-1 


3 


15 


0 


0 


5 


5 


— 


— 


1200.00 


5 


4-1 


18 


87 


0 


0 


4 


4 


— 


— 


0120.00 


5 


4-1 


24 


126 


0 


0 


10 


10 


— 


_ 


0111.00 


5 


4-1 


8,391 


26,677 


0 


0 


7 


7 


— 


— 


0110.10 


5 


4-1 


1 


5 


0 


0 


8 


8 


— 


— 


0102.00 


5 


4-1 


602 


2,052 


0 


0 


10 


10 


— 


— 


0101.10 


5 


4-1 


579 


1,436 


0 


0 


8 


8 


— 


— 


0100.20 


5 


4-1 


4 


24 


0 


0 


8 


8 


— 


— 


0210.00 


5 


4-1 


4,761 


17,210 


0 


0 


5 


5 


— 


— 


0201.00 


5 


4-1 


8,533 


29,009 


0 


0 


5 


5 


— 


— 


0200.10 


5 


4-1 


16 


56 


0 


0 


6 


6 


— 


— 


0900.00/30 


5 


4-1 


3,566 


13,290 


0 


0 


4 


4 


— 


— 



Table lc. Chess Endgames: 3-to-5-man DTC data. 
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Endgame 






DTZ Metric 
# of maximal positions 1 

1-0 0-1 


max depth, moves 
1-0 0-1 




GBR 


# 


w-b 


wtm 


btm 


wtm 


btm 


wtm 


btm 


wtm 


btm 


KPK 


0000.10 


3 


2-1 


8 


4 


0 


0 


10 


10 


— 


— 


KBKP 


0010.01 


4 


2-2 


104 


28 


779 


585 


1 


0 


3 


4 


KNKP 


0001.01 


4 


2-2 


23 


6 


6 


2 


6 


5 


8 


8 


KPKP 


0000.11 


4 


2-2 


1 


1 


1 


1 


11 


10 


10 


11 


KQKP 


1000.01 


4 


2-2 


1 


1 


20 


385,976 


26 


26 


1 


1 


KRKP 


0100.01 


4 


2-2 


2 


38 


3 


3 


13 


12 


10 


10 


KBPK 


0010.10 


4 


3-1 


38 


42 


0 


0 


13 


13 


— 


— 


KNPK 


0001.10 


4 


3-1 


108 


8 


0 


0 


13 


13 


— 


— 


KPPK 


0000.20 


4 


3-1 


125 


152 


0 


0 


7 


7 


— 


— 


KQPK 


1000.10 


4 


3-1 


25 


107 


0 


0 


3 


3 


— 


— 


KRPK 


0100.10 


4 


3-1 


1,643 


6,556 


0 


0 


3 


3 


— 


— 


KBBKP 


0020.01 


5 


3-2 


16 


16 


5 


47 


21 


21 


8 


8 


KBNKP 


0011.01 


5 


3-2 


202 


39 


494 


157 


20 


20 


8 


8 


KBPKB 


0040.10 


5 


3-2 


13 


22 


508 


1,524 


25 


25 


2 


3 


KBPKN 


0013.10 


5 


3-2 


20 


5 


23 


86 


30 


30 


3 


4 


KBPKP 


0010.11 


5 


3-2 


9 


4 


24 


30 


37 


37 


5 


6 


KBPKQ 


3010.10 


5 


3-2 


1,438 


30 


1 


2 


3 


3 


42 


42 


KBPKR 


0310.10 


5 


3-2 


5 


39 


5 


6 


13 


12 


18 


19 


KNNKP 


0002.01 


5 


3-2 


18 


13 


1 


1 


82 


81 


11 


11 


KNPKB 


0031.10 


5 


3-2 


39 


33 


5 


18 


24 


24 


8 


9 


KNPKN 


0004.10 


5 


3-2 


2 


25 


27 


132 


30 


29 


3 


4 


KNPKP 


0001.11 


5 


3-2 


1 


1 


12 


4 


23 


23 


7 


7 


KNPKQ 


3001.10 


5 


3-2 


2,459 


4 


1 


1 


3 


3 


43 


43 


KNPKR 


0301.10 


5 


3-2 


8 


36 


3 


9 


18 


18 


39 


40 


KPPKB 


0030.20 


5 


3-2 


2 


5 


1 


13 


12 


12 


1 


2 


KPPKN 


0003.20 


5 


3-2 


3 


8 


45 


100 


14 


13 


6 


7 


KPPKP 


0000.21 


5 


3-2 


1 


3 


1 


4 


21 


21 


7 


7 


KPPKQ 


3000.20 


5 


3-2 


8 


15 


19 


16 


6 


5 


29 


29 


KPPKR 


0300.20 


5 


3-2 


67 


83 


13 


14 


14 


14 


15 


15 


KQBKP 


1010.01 


5 


3-2 


5 


14 


791 


2,934,215 


11 


11 


1 


1 


KQNKP 


1001.01 


5 


3-2 


7 


1 


928 


5,722,853 


17 


17 


1 


1 


KQPKB 


1030.10 


5 


3-2 


13,462 


65,629 


374 


1,290 


5 


5 


1 


2 


KQPKN 


1003.10 


5 


3-2 


26 


105 


3 


9 


6 


6 


1 


2 


KQPKP 


1000.11 


5 


3-2 


69 


2 


1,024 


7,412,631 


5 


5 


1 


1 


KQPKQ 


4000.10 


5 


3-2 


1 


3 


2 


4 


71 


70 


15 


16 


KQPKR 


1300.10 


5 


3-2 


3 


19 


5,177 


26,128 


17 


17 


2 


3 


KQQKP 


2000.01 


5 


3-2 


13,425 


1,987 


140 


16,368 


3 


3 


1 


1 


KQRKP 


1100.01 


5 


3-2 


76,181 


2,592 


683 


892,287 


3 


3 


1 


1 


KRBKP 


0110.01 


5 


3-2 


2 


10 


4 


8 


11 


11 


4 


5 


KRNKP 


0101.01 


5 


3-2 


19 


26 


2 


2 


15 


14 


10 


10 


KRPKB 


0130.10 


5 


3-2 


5 


7 


502 


1,672 


53 


53 


1 


2 


KRPKN 


0103.10 


5 


3-2 


8 


15 


4 


12 


31 


31 


1 


2 


KRPKP 


0100.11 


5 


3-2 


20 


22 


17 


18 


9 


9 


10 


10 


KRPKQ 


3100.10 


5 


3-2 


2 


5 


3 


1 


9 


8 


75 


76 


KRPKR 


0400.10 


5 


3-2 


3 


4 


14 


43 


35 


35 


6 


7 


KRRKP 


0200.01 


5 


3-2 


16 


32 


988 


1,506,491 


9 


9 


1 


1 


KBBPK 


0020.10 


5 


4-1 


5 


1 


0 


0 


12 


12 


— 


— 


KBNPK 


0011.10 


5 


4-1 


74 


199 


0 


0 


5 


5 


— 


— 


KBPPK 


0010.20 


5 


4-1 


16 


32 


0 


0 


9 


9 


— 


— 


KNNPK 
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Endgame 

GBR 


# 


w-b 


DTZ Metric 
# of maximal positions 
1-0 0-1 
wtm btm wtm btm 


max depth, moves 
1-0 0-1 
wtm btm wtm btm 


KNPPK 


0001.20 


5 


4-1 


l 


7 


0 


0 


6 


6 — — 


KPPPK 


0000.30 


5 


4-1 


16 


64 


0 


0 


7 


7 — — 


KQBPK 


1010.10 


5 


4-1 


2,085 


6,415 


0 


0 


3 


3 — — 


KQNPK 


1001.10 


5 


4-1 


958 


4,181 


0 


0 


3 


3 — — 


KQPPK 


1000.20 


5 


4 1 


20 


88 


0 


0 


3 


3 — — 


KQQPK 


2000.10 


5 


4-1 


29 


81 


0 


0 


3 


3 — — 


KQRPK 


1100.10 


5 


4-1 


2,330 


6,022 


0 


0 


3 


3 — — 


KRBPK 


0110.10 


5 


4-1 


67 


114 


0 


0 


4 


4 — — 


KRNPK 


0101.10 


5 


4-1 


36 


152 


0 


0 


4 


4 — — 


KRPPK 


0100.20 


5 


4-1 


270 


651 


0 


0 


3 


3 — — 


KRRPK 


0200.10 


5 


4-1 


6,122 


11,124 


0 


0 


3 


3 — — 



Table 2b. Chess Endgames: 3-to-5-man DTZ data. 

DTZ 50 Metric 

Endgame # of maximal positions max depth, moves 

1-0 0-1 1-0 0-1 





GBR 


# 


w-b 


wtm 


btm 


wtm 


btm 


wtm 


btm 


wtm 


btm 


KBBKN 


0023.00 


5 


3-2 


347,796 


485,538 


44 


222 


50 


50 


0 


l 


KBBKP 


0020.01 


5 


3-2 


16 


16 


3 


4 


21 


21 


9 


10 


KBBKQ 


3020.00 


5 


3-2 


248 


58 


86,896 


24,793 


4 


3 


50 


50 


KBNKN 


0014.00 


5 


3-2 


12,123 


5,857 


104 


533 


50 


50 


0 


1 


KBNKP 


0011,01 


5 


3-2 


202 


39 


494 


157 


20 


20 


8 


8 


KBPKN 


0013.10 


5 


3-2 


20 


5 


23 


86 


30 


30 


3 


4 


KNNKP 


0002.01 


5 


3-2 


60,080 


12,023 


1 


1 


50 


50 


11 


11 


KNNKQ 


3002.00 


5 


3-2 


2,387 


465 


6,352 


2,010 


1 


0 


50 


50 


KNPKN 


0004.10 


5 


3-2 


2 


25 


27 


132 


30 


29 


3 


4 


KNPKQ 


3001.10 


5 


3-2 


2,459 


4 


1 


1 


3 


3 


43 


43 


KPPKP 


0000.21 


5 


3-2 


1 


3 


1 


4 


21 


21 


7 


7 


KPPKQ 


3000.20 


5 


3-2 


8 


15 


19 


16 


6 


5 


29 


29 


KQPKP 


1000.11 


5 


3-2 


69 


2 


1,024 


7,412,631 


5 


5 


1 


1 


KQPKQ 


4000.10 


5 


3-2 


1,595 


2,415 


2 


4 


50 


50 


15 


- 16 


KQRKP 


1100.01 


5 


3-2 


76,181 


2,592 


683 


892,287 


3 


3 


1 


1 


KQRKQ 


4100.00 


5 


3-2 


23 


156 


1 


2 


50 


50 


8 


9 


KRBKR 


0410.00 


5 


3-2 


1,041 


175 


3 


14 


50 


50 


3 


4 


KRPKB 


0130.10 


5 


3-2 


130 


254 


502 


1,672 


50 


50 


1 


2 


KRPKP 


0100.11 


5 


3-2 


20 


22 


17 


18 


9 


9 


10 


10 


KRPKQ 


3100.10 


5 


3-2 


2 


5 


9,275 


4,898 


9 


8 


50 


50 



Table 3. Chess Endgames: 3-to-5-man data where EZ 50 ^ EZ. 
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DTZ Metric 





Endgame 






# of maximal positions 


max depth, moves 












1-0 




0-1 


| 1-0 


0-1 




GBR 


# 


w-b 


wtm 


btm 


wtm 


btm 


wtm 


btm 


wtm 


btm 


KBBKNN 


0026.00 


6 


3-3 


11 


1 


488 


1,518 


38 


38 


3 


4 


KQQKBB 


2060.00 


6 


3-3 


984 


5,128 


137 


714 


6 


6 


3 


4 


KQQKNN 


2006.00 


6 


3-3 


2 


8 


1 


36,110 


7 


7 


1 


1 


KQQKQR 


5300.00 


6 


4-2 


4 


2 


1 


12 


48 


47 


56 


56 


KRRKRB 


0530.00 


6 


3-3 


22 


13 


1 


455 


54 


54 


6 


6 


KBBBKN 


0093.00/30 


6 


4-2 


6 


6 


951 


4,838 


12 


12 


0 


1 


KBBBKQ 


3090.00/30 


6 


4-2 


1 


9 


1 


3 


10 


9 


51 


51 


KBBNKN 


0024.00 


6 


4-2 


9 


54 


3,663 


18,984 


31 


31 


0 


1 


KBNNKN 


0015.00 


6 


4-2 


17 


56 


4,335 


22,890 


28 


28 


0 


1 


KBNNKQ 


3012.00 


6 


4-2 


5 


1 


1 


4 


12 


11 


49 


49 


KNNNKQ 


3009.00/30 


6 


4-2 


1 


1 


6 


11 


9 


8 


35 


35 


KQNNKQ 


4002.00 


6 


4-2 


2 


2 


5 


20 


71 


71 


13 


14 


KRNNKQ 


3102.00 


6 


4-2 


2 


1 


2 


3 


28 


27 


41 


41 



Table 4. Chess Endgames: some 6-man DTZ data. 



DTZ 50 Metric 





Endgame 






1 # of maximal positions 


max depth, moves 










1-0 






0-1 


1-0 


0-1 




GBR 


# 


w-b 


wtm 


btm 


wtm 


btm 


wtm 


btm 


wtm 


btm 


KBBKNN 


0026.00 


6 


3-3 


46 


17 


488 


1,518 


29 


28 


3 


4 


KQQKBB 


2060.00 


6 


3-3 


1 


5 


137 


714 


8 


8 


3 


4 


KQQKNN 


2006.00 


6 


3-3 


2 


8 


1 


36,110 


7 


7 


1 


1 


KQQKQR 


5300.00 


6 


4-2 


4 


2 


6 


26 


48 


47 


50 


50 


KRRKRB 


0530.00 


6 


3-3 


372 


107 


1 


455 


50 


50 


6 


6 


KBBBKN 


0093.00/30 


6 


4-2 


3 


6 


951 


4,838 


14 


14 


0 


1 


KBBBKQ 


3090.00/30 


6 


4-2 


1 


9 


11 


15 


10 


9 


50 


50 


KBBNKN 


0024.00 


6 


4-2 


9 


54 


3,663 


18,984 


31 


31 


0 


1 


KBNNKN 


0015.00 


6 


4-2 


3 


3 


4,335 


22,890 


29 


29 


0 


1 


KBNNKQ 


3012.00 


6 


4-2 


5 


1 


1 


4 


12 


11 


49 


49 


KNNNKQ 


3009.00/30 


6 


4-2 


1 


1 


6 


11 


9 


8 


35 


35 


KQNNKQ 


4002.00 


6 


4-2 


10,534 


9,796 


5 


20 


50 


50 


13 


14 


KRNNKQ 


3102.00 


6 


4-2 


2 


1 


2 


3 


28 


27 


41 


41 



Table 5. Chess Endgames: some 6-man DTZ 50 data. 



The following lines, starting from some positions listed in Table 7 below, show strategies 
variously retaining the win, failing to retain the win, repeating positions to draw or being 
suboptimal. They include an established notation showing the criticality of the moves: 

" = unique value-preserving move ; ' = only optimal move ; ° = only legal move. 

KBBKP position BB-P1 - dtz = lm; dtzso = 7m: 

S<(> “Sa, a = C‘, M‘ or Z‘: 1. ... alQ+?? {dtz = 51m; White can force a 50m draw} V 2 -V 2 . 
SZ 50 + ~SZ 50 ‘: 1. ... Kc4" 2. Bf3+ Kc3 3. Bel+' Kd4" 4. Bf2+’ Ke5' 5. Bg3+' Kf6' 6. Bh4+' 
Kg7' {dtm= 17m} 0-1. 

KNNKP position NN-P1 - dtz = 20m, dtc = 63m, dtm = 64m, dtzso - 44m: 

S(C, M, Z)'ct “ SZ 50 + : 1. Ngl?? h3" { dtz = 61m; Black can force a 50m draw} V 2 -V 2 . 

SZ 50 ' “ SZ 50 + : 1. Ngf2' Ke3’ 2. Kc3' Ke2' 3. Kd4' Kd2' 4. Ne4+' Ke2' 5. Neg5' Kd2* 6. 
Nf3+’ Ke2' 7. Ke4' Kfl' 8. Kd3 Kg2° 9. Nfg5' Kg3’ 10. Ke3 Kg4' 11. Ke4' Kg3' ... 1-0. 






94 



J.A. Tamplin, M c C.Haworth 



KNNKP position NN-P2 - dtz = lm, dtzso = 43m: 

SZ'ct “St: 1. Nbc4'?? {dtz = 58m; Black can force a 50m draw} V 2 -V 2 . 

S(C/M)'ct — SZ 50 + : 1. Na4" {dtz 50 = 42m, dm = 88m} Kd2° 1-0. 

SZ 50 ' ”SZ 50 + : 1. Na4 M Kd2° 2. Nc4+ Kd3' 3. Ncb2+ Kd2 4. Kbl Ke3' 5. Kcl Ke2 6. Kc2' 
Ke3' 7. Kc3' Ke4' 8. Nd3 Ke3 9. Ndc5' Kf4’ 10. Kd4' Kf5’ 11. Nd3 Ke6 12. Ke4 ! Kd6 13. Nf4 
Kc6 14. Ne2' Kd6' 15. Nd4’ Ke7' 16. Ke5’ Kf7’ 17. Kf5' Ke7 18. Nb5’ Kd7 19. Ke5' Kc6' 20. 
Na3' Kd7' 21. Nc5+ ( Ke7' 22. Ne4' Kf 7’ 23. Kd6’ Kg7 24. Ke6' Kg6' 25. Nc4' Kg7' 26. Ned2 
Kg6 27. Nf3' Kh6 28. Kf5' Kg7 29. Ng5' Kf8 30. Kf6' Ke8' 31. Ke6' Kf8' 32. Nh3' Kg8’ 33. 
Nf4 Kg7' 34. Ke7' Kh6 35. Kf6' Kh7' 36. Ne2 Kh6 37. Ng3' Kh7' 38. Nf5' Kg8 39. Ke7’ Kh8' 
40. Ne5 Kh7 41. Ke8 Kg8 42. Ng6’ Kh7' 43. Kf7’ a4° {dm = 3m} 1-0. 

KNNKP position NN-P3 - dtz = lm, dtzso indicates ‘draw’, dtr = 51m, dtzsi = 31m: 

SZ°ct - S(p, ct = C\ M‘, Z or Z 50 ‘: 1. Kc2? {dtr > 51m}. 

SZ°H 50 ‘ - SH 50 + : 1. Nbl+' {dtr = 51m, controlling DTR} Ka4' 

KNNKP position NN-P4 - dtz = 16m, dtzso indicates ‘draw’, dtz 51 - 25m, mleft = 25m: 
SZ°H 50 ‘ - SH 50 + : 1. Nd5+? {dtz 5 i = 26m} Kc4' 2. Ndc3 Kb4' {NN-P4 repeated} Vi-Vi. 

KQPKQ position QP-Q1 - dtc = 52m, dtz = lm, dtzso = 50m: 

Sa ” St, a = C', M‘ or Z‘: 1. b7’?? {dtz = 51m; Black can force a 50m draw} V 2 -V 2 . 

SZ 50 - - SZ 50 + : 1. Qg5" Qe4' 2. Kc5 M Qc2+' 3. Kd5 Qb3+ 4. Kc6' Qe6+' 5. Kc5’ Qc8+' 6. 
Kd4' Qh8+' 7. Kc4’ Qh7 8. Qd5' Qc2+' 9. Kb4 Qb2+' 10. Kc5' Qa3+ 11. Kc6' Qa4+' 12. Kd6 
Qf4+' 13. Kd7' Qg4+' 14. Qe6' Qg7+' 15. Kd6' Qg3+' 16. Kc5’ Qg5+' 17. Kc4 Qcl+' 18. Kd5' 

Qb2' 19. Qg6’ Qb5+' 20. Kd4" Qb4+' 21. Ke5' Qc5+' 22. Kf4 M Qd4+’ 23. Kf5 Qc5+' 24. Kg4’ 

Qd4+' 25. Kh5' Qd5+' 26. Kh6’ Kel' 27. Qgl+' Ke2' 28. Qg4+' Kfl' 29. Qg5' Qc6+ 30. Qg6" 
Qb7' 31. Qf6+' Ke2' 32. Kg5' Ke3' 33. Qe5+ Kf2' 34. Qc5+' Ke2' 35. Kf4' Qf3+' 36. Ke5' 

Qg3+ 37. Ke6' Qh3+ 38. Kd6' Qh6+’ 39. Kc7' Qg7+ 40. Kc6' Qf6+ 41. Qd6' Qc3+' 42. Kd7’ 

Qf3’ 43. Kc8 Qc3+ 44. Kd8’ Qa5’ 45. Ke7’ Qb5 46. Qf6’ Qbl 47. Kf7 Qh7+ 48. Kf8’ Qbl 49. 
Qe7+* Kdl 50. b7’ {dm = 21m} 1-0. 

KRPKP position RP-P2 - dtz = lm, dtzso - 6m: 

S(p “ Scft, ct = C", M‘ or Z': 1. ... glQ'?? { dtr > 50m; White can force a 50m draw} V 2 -V 2 . 
SZ 50 + ”SZ 50 ‘: 1. ... Kb2” 2. Rb4+’ Kc2 M 3. Rc4+' Kd2' 4. Rd4+' Ke2' 5. Re4+' Kf2 M 6. Re7 
glQ" {dtm = 49m} 0-1. 

KRPKQ position RP-Q1 - dtz = 2m, dtzso = 21m: 

S(p “ Sar, ct = C' or M‘: 1. ... Qd6+'?? { dtr > 50m; White can force a 50m draw} V 2 -V 2 . 
S<p”SZ't: 1. ... Qe4+’?? {dtr > 50m; White can force a 50m draw} V 2 -V 2 . 

SZ 50 + “SZ 50 ': 1. ... Qe6+" 2. Kg5' Qg8+" 3. Kh6' Qd5' 4. Rg7' Qhl+" 5. Kg6' Qgl+’ 6. 
Kf7' Qfl 7. Rg6+' Kb7' 8. Rf6' Qg2 9. Ke6 Qe4+' 10. Kd6 Kb6 11. Rf7' Kb5' 12. Rf6 Kc4' 13. 
Rf7' Kd4' 14. Rf8' Qd5+' 15. Ke7' Qc5+' 16. Kf7 Kd5" 17. Kg7 Qgl+' 18. Kf6 Qg4' 19, Ke7' 
Qe6+' 20. Kd8° Kc6 21. Rf6 Qxf6+'’ {dtm = 2m} 0-1. 

KBBKNN position BB-NN1 - dtz = lm, dtzso = 28m: 

Sctt ~S<p, ct = C\ M‘ or Z‘: 1. Bxg6’?? { dtz = 54m; Black can force a 50m draw} V 2 -V 2 . 
SZ 50 '“SZ 50 + : 1. Bd6" Nh8' 2. Bc6+" Ka5° 3. Kb3" Ncl+' 4. Kc4 M Nf7' 5. Bc7+" Ka6° 6. 
Bd5" Nh8' 7. Bf3' Ng6’ 8. Bd6 M Nh4' 9. Be4" Ne2' 10. Bh2" Ka5' 11. Bc7+' Ka6' 12. Kc5' 
Ka7' 13. Bd3' Ngl’ 14. Bg3 Ng2' 15. Kc6’ Nh3' 16. BfV Nhf4' 17. Bf2+" Kb8' 18. Bb6' Ka8' 
19. Ba6' Kb8' 20. Bc4' Nh5' 21. Bc7+' Ka7 22. Be5' Nhf4' 23. Bd6' Nh5 24. Kc7' Nf6' 25. 
Bc5+' Ka8° 26. Bb5 Nd5+' 27. Kc8" Nel 28. Bc6#’ 1-0. 

KQQKBB position QQ-BB1 - dtz = 2m, dtzso = 7m: 

SZ' ~S<p: 1. Qxd4+'?? Bxd4+ {dtz = 67m; Black can force a 50m draw} V 2 -V 2 . 

SZ 50 ” - SZ 50 + : 1. Kbl” Be4+' 2. Ka2" Bd5+' 3. Ka3' Bd6+' 4. Ka4' Bc6+' 5. Ka5 Kc3 6. 
Qcl+’ Kb3 7. Qxc6' { dtm = 2m} 1-0. 
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KBNNKQ position BNN-Q1 - dtz = lm, dtzso = 36m: 

S<p Sa, ct = C', M" or Z": 1. ... Qxal’?? {dtz = 52m; Black can force a 50m draw} V 2 -V 2 . 
SZ50' “ SZ 50 + : 1. ... Qh7+" 2. Kd2' Qd7+" 3. Kc3' Ke2’ 4. Bb2' Qg4" 5. Kb3’ Qe6" 6. Kc3' 
Qe4" 7. Kb3' Qg4' 8. Kc3' Qf4' 9. Kb3' Qb8+' 10. Kc2' Qb4’ 11. Na3' Qe4+" 12. Kb3' Qd5+' 
13. Kc3' Qf3+' 14. Kc4' Kdl 15. Kb4' Qb7+" 16. Nb5' Kc2' 17. Bd4' Qe7+' 18. Kc4' Qe6+’ 
19. Kc5' Qf5+' 20. Kc4' Qc8+' 21. Kb4' Qf8+' 22. Ka4 Qg8' 23. Kb4 Kd3' 24. Bc3' Qd5' 25. 
Bd4 Qc4+' 26. Ka5' Qg8' 27. Ka4’ Qa8+’ 28. Kb4’ Qf8+’ 29. Kb3’ Qe7' 30. Bb2' Qe6+ f 31. 
Ka4 Qa2+ 32. Ba3' Qc4+' 33. Ka5 Qd5' 34. Kb4' Qe4+ 35. Ka5 Qa8+’ 36. Kb6 Qxh8 {dm = 
22m} 0-1. 



KQNNKQ position QNN-Q1 - dtz - 3m, dtzso = 4m, dtm - 5m: 

SZ ~SZ + . 1. Qa3+’?? Kdl' 2. Qal+" Ke2° 3. Qxhl" {dtz = 52m} 
SZ 5 o‘”SZ 5 o + : 1. Qe3+" Kbl' 2. Qb6+'' Kcl’ 3. Qb2+’ Kdl°4. Qd2# 1-0. 



Endgame 



% of nominal wins 







I # extra draws 


# delayed 


| extra draws 


delayed 




res. 


wtm 


btm 


wtm 


btm 


wtm 


btm 


wtm 


btm 


KBBKN 


1-0 


3,993,656 


7,852,543 


0 


0 


21.05 


48.20 


0 


0 


KBBKP 


1-0 


171 


687 


3,889 


1,800 


8 


8 


0.01 


8 




0-1 


119,226 


1,444,441 


1,524 


3,741 


5.85 


8.47 


0.07 


0.02 


KBBKQ 


0-1 


2,154,114 


490,797 


0 


0 


8.49 


1.46 


0 


0 


KBNKN 


1-0 


139,893 


72,483 


0 


0 


0.52 


1.93 


0 


0 


KBNKP 


1-0 


185 


275 


1,641 


1,685 


8 


8 


8 


8 


KBPKN 


1-0 


257 


264 


602 


1,530 


8 


8 


8 


8 


KNNKP 


1-0 


10,684,968 


9,495,721 


17,093,973 


6,239,778 


26.35 


46.87 


42.16 


30.80 




0-1 


4,255 


10,877 


301 


357 


0.14 


0.06 


0.01 


8 


KNNKQ 


0-1 


11,990 


3,667 


0 


0 


0.05 


0.01 


0 


0 


KNPKN 


1-0 


61 


86 


48 


39 


8 


8 


8 


8 


KNPKQ 


0-1 


1 


0 


0 


0 


8 


0 


0 


0 


KPPKP 


1-0 


1,834 


2,062 


149 


55 


8 


8 


8 


8 


KPPKQ 


1-0 


1,641 


3 


0 


0 


0.01 


0.01 


0 


0 


KQPKP 


1-0 


19 


3,266 


2,664 


2,207 


8 


8 


8 


8 


KQPKQ 


1-0 


28,468 


22,411 


42,756 


28,526 


0.02 


0.08 


0.03 


0.10 


KQRKP 


1-0 


0 


79 


0 


0 


0 


8 


0 


0 


KQRKQ 


1-0 


230 


1,106 


0 


0 


8 


8 


0 


0 


KRBKR 


1-0 


2,263 


725 


0 


0 


0.01 


0.02 


0 


0 


KRPKB 


1-0 


35 


83 


53 


74 


8 


8 


8 


8 


KRPKP 


1-0 


0 


240 


124 


33 


0 


8 


8 


8 




0-1 


679 


12,137 


26 


30 


0.14 


0.05 


0.01 


8 


KRPKQ 


1-0 


1,592 


1 


116 


0 


8 


8 


8 


0 




0-1 


72,802 


29,723 


26,336 


9,097 


0.06 


0.02 


0.02 


8 


KBBKNN 


1-0 


141,874,223 


38,562,549 


4,961,624 


1,402,773 


50.15 


70.98 


1.75 


2.58 


KQQKBB 


1-0 


23,343 


6,776,509 


1,244,572 


5,432,160 


8 


0.58 


0.18 


0.47 


KQQKNN 


1-0 


130 


44,687 


, 4,704 


22,000 


8 


8 


8 


8 


KQQKQR 


0-1 


17,313 


41,775 


42,552 


66,504 


i 0.02 


0.01 


0.04 


0.01 


KRRKRB 


1-0 


380 


145 


0 


0 


8 


8 


0 


0 




0-1 


396 


11,281 


30 


799 


0.02 


0.03 


8 


8 


KBBBKN 


1-0 


743,762 


37,035,833 


55,589,963 


161,070,140 


0.15 


6.16 


11.28 


26.80 


KBBBKQ 


0-1 


21,650,797 


31,223,711 


6,004,068 


11,096,464 


15.04 


6.15 


4.17 


2.19 


KBBNKN 


1-0 


640,358 


36,582,112 


136,891,517 318,970,567 


0.03 


1.74 


6.44 


15.17 


KBNNKN 


1-0 


96,123 


1,016,653 


10,322,215 


13,062,956 


8 


0.05 > 


0.46 


0.70 


KBNNKQ 


0-1 


178,774 


178,631 


179,015 


143,015 


0.03 


0.01 


0.03 


0.01 


KNNNKQ 


0-1 


125,488 


181,848 


91,063 


99,907 


0.09 


0.04 


0.07 


0.02 


KQNNKQ 


1-0 


49,329 


38,050 


0 


0 


8 


0.01 


0 


0 




0-1 


1,538 


206,733 


0 


2 


0.04 


0.05 


0 


8 


KRNNKQ 


0-1 


33,448 


252,183 


10,270 


30,764 


0.04 


0.03 


0.01 


8 



Table 6. The impact of the 50-move drawing rule. 
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Key 




Position 


stm 


j 


depth in plies 


Notes 










1 dtc dtm dtr 


dtz 


dtz 5( 




EZ 50 9 ^ EZ 


















BB-N 


1-0 


8/8/8/7B/4k3/4B3/3K4/ln6 


w 


119 


143 


119 


119 


— 


q.v. BB-P 


BB-P 


1-0 


8/8/8/7B/4k3/4B3/lplK4/8 


b 


6 


144 


119 


6 


*— 


1. ... bl=N+ {BB-N} 




0-1 


8 / 8 / 6 B l/3K4/5B2/8/p7/3k4 


b 


1 


157 


136 


1 


— 


1. ... al=Q" {BB-Q} 


BB-Q 


0-1 


8 / 8 / 6 B l/3K4/5B2/8/8/q2k4 


w 


136 


156 


136 


136 


— 


q.v. BB-P 


BN-N 


1-0 


8/8/3K4/8/8/3B4/k7/lnlN4 


w 


139 


199 


139 


139 


— 


q.v. BN-P 


BN-P 


1-0 


8/8/3K4/8/8/3B4/kp6/3N4 


b 


9 


200 


139 


9 


— 


1. ... bl=N {BN-N} 


BP-N 


1-0 


ln6/3P4/8/8/l K6/7B/8/k7 


w 


1 


199 


138 


1 


— 


1. d 8 =N" {<fc=138p} 


NN-P 


1-0 


Klk5/3NlN2/8/8/4p3/8/8/8 


w 


169 


169 


164 


164 


— 


maxDTZ pos. 




0-1 


3k3N/3N4/3K4/8/8/8/7p/8 


b 


1 


145 


126 


1 


— 


1. ... hl=Q" {NN-Q} 


NN-Q 


0-1 


3k3N/3N4/3K4/8/8/8/8/7q 


w 


126 


144 


126 


126 


— 


q.v. NN-P 


NP-N 


1-0 


kn6/3P4/l K6/8/8/8/3N4/8 


w 


1 


191 


130 


1 


— 


1. d 8 =B" {dtz= 130p} 


NP-Q 


0-1 


1 k 1 K4/4P 1 N l/ 8 / 8 / 8 / 6 q 1/8/8 


w 


6 


124 


103 


6 


— 


1. e 8 =N {d/z=103p} 


PP-P 


1-0 


8/4P3/8/8/8/4P3/kp 1 K4/8 


b 


2 


244 


102 


2 


— 


1. ... bl=Q {PP-Q} 


PP-Q 


1-0 


8/4P3/8/8/8/4P3/k2K4/l q 6 


w 


1 


243 


102 


1 


_ 


1. e 8 =Q" {QP-Q} 


QP-P 


1-0 


8/4Q3/8/8/8/K7/6Pp/5k2 


w 


5 


191 


? 


1 


— 




QP-Q 


1-0 


4Q3/8/8/8/8/4P3/k2K4/lq6 


b 


222 


242 


102 


102 


i 


q.v. PP-Q 


QR-P 


1-0 


Q7/2k5/ 8 / 8 / 8 / 8/R2p4/K7 


b 


2 


134 


119 


2 


— 


1. ... dl=Q {QR-Q} 


QR-Q 


1-0 


Q7/2k5/8/8/8/8/R7/K2q4 


w 


119 


133 


119 


119 


— 


q.v. QR-P 


RB-R 


1-0 


8/3B4/8/ 1 R6/5r2/ 8/3K4/5k2 j 


w 


117 


129 


117 


117 


— 


maxDTZ pos. 


RP-B 


1-0 


KlR5/8/3k4/3P4/8/8/lb6/8 


w 


113 


131 


105 


105 


— 


maxDTZ pos. 


RP-P 


1-0 


6Rl/P6K/lk6/8/8/8/3p4/8 


b 


1 


136 


120 


1 


— 


1.... dl=Q" {dtz= 120p} 




0-1 


8/8/8/5PR l/8/2K5/5p2/k7 


w 


2 


188 


130 


2 


— 


1. Kd4" flQ" {dtz = 130p} 


RP-Q 


1-0 


6 R 1 /P7/2q5/2k5/ 8 / 8 / 8 / 6 K 1 


b 


2 


118 


102 


2 


— 


only frustrated btm 1-0 pos. 




0-1 


8/7R/6Kl/8/5P2/8/8/k6q 


b 


116 


165 


107 


3 


— 




BB-NN 


1-0 


8 / 6 B 1/8/8/2B ln3/6Kl/3k3n/8 


w 


1 


147 


122 


1 


— 


1. Kxh2" {dtz = 122p} 


QQ-BB 


1-0 


8/8/8/4b3/8/Q7/2klb3/K5Ql 


w 


2 


143 


121 


2 


— 


1. Qc3" Bxc3+ {dtz- 121p} 


QQ-NN 


1-0 


8/8/8/8/lQ6/3n4/2n3kl/K3Q3 


w 


3 


135 


113 


3 


— 


1. Kbl" Ncxel {dtz =1 13p} 


QQ-QR 


0-1 


8/Q7/lQ6/8/r7/8/8/qK5k 


w 


2 


132 


116 


2 


— 


1 . Kc2° Rxa7" {dtz= 116p} 


RR-RB 


1-0 


3R4/8/R7/8/8/8/6rl/k3K2b 


b 


102 


122 


102 


102 


— 






0-1 


8/R7/8/4b3/8/lr6/R7/K3k3 


w 


2 


116 


102 


2 


— 


1 . Rb2° Rxb2" {<//z=102p} 


BBB-N 


1-0 


8/8/8/8/8/8/2B ln3/Klk3BB 


w 


2 


145 


119 


2 


— 


l.Bb6Kxc2{^z=119p} 


BBB-Q 


0-1 


8/8/8/8/q7/3BB3/8/K2kB3 


w 


1 2 


142 


120 


2 


— 


1. Kb2 Kxel {dtz= 120p} 


BBN-N 


1-0 


8/8/8/8/8/8/2N2B2/Klkn3B 


w 


I 2 


141 


115 


2 


— 


1. Bb 6 Kxc2 {dtz=115p} 


BNN-N 


1-0 


n7/8/8/8/8/6B l/6Nl/K4kNl 


w 


1 2 


181 


119 


2 


— 


1. Ne3+" Kxgl° {^z=119p} 


BNN-Q 


0-1 


q7/8/8/8/8/N7/3N4/KlkB4 


w 


2 


126 


104 


2 


— 


1. Ndc4 Kxdl" {dtz = 104p; 


NNN-Q 


0-1 


8/8/2q5/8/8/N7/3N4/KlkN4 


w 


2 


126 


104 


2 


— 


1. Ndc4 Kxdl" {dtz = 104p} 


QNN-Q 


1-0 


7q/lQ6/8/5N2/8/8/8/Klk4N 


w 


101 


107 


101 


101 


— 


1. Ng7" ... 




0-1 


8/8/8/1 N6/8/8/N7/kqK2Q2 


w 


2 


124 


104 


2 


— 


1 . Kd2° Qxf 1 " {dtz= 104p} 


RNN-Q 


0-1 


8/8/lR6/q7/3N4/8/4N3/K2k4 


w 


2 


122 


102 


2 


— 


1 . Kb2 Qxb 6 +" {dtz = 102p 


Strategy Failure Positions 1 














SZ 50 ok if dtz 50 cited 


BB-P1 


1-0 


8/8/8/1 k6/8/8/p4BB 1/3K4 


b 


1 


123 


58? 


1 


13 


S(Co/M/Zo) x 


NN-P1 


1-0 


8/8/8/8/2K3Np/7N/3k4/8 


w 


126 


127 


40 


88 


88 


S(C/M/Z)o x 


NN-P2 


1-0 


8/8/1 N6/p7/8/4N3/8/Klk5 


w 


176 


177 100? 


2 


86 


SZo x; S(C/M) ok 


NN-P3 


1-0 


8/8/8/2pN4/ 8 /k 1 N5/8/2K5 


w 


115 


115 


102 


2 


— 


SZ°H 50 ': 1. Nbl+' 


NN-P4 


1-0 


8/8/8/2p5/lk6/2N5/2K5/lN6 


w 


113 


113 


102 


32 


— 


SZ°H 50 ' repeats positions 


QP-Q1 


1-0 


8/8/1 P5Q/1 K6/3q4/8/5k2/8 


w 


103 


125 


99 


1 


99 


S(C/M/Z)g x 


RP-P1 


1-0 


6Rl/8/Pk6/8/8/8/p2K4/8 


w 


3 


31 


26? 


1 


5 


S(C/Z)o x; SM ok 


RP-P2 


1-0 


8/8/5K2/8/2R2P2/8/6pl/k7 


b 


1 


159 


? 


1 


11 


S(C/M/Z)o x 


RP-Q1 


1-0 


8/4q2R/k5Kl/8/5P2/8/8/8 


b 


113 


163 


? 


3 


41 


S(C/M/Z)o x 


BB-NN1 


1-0 


8/8/6nl/8/k3BB2/8/nlK5/8 


w 


1 


133 


55 


1 


55 


S(C/M/Z)cr x 


QQ-BB 1 


1-0 


8/Q7/8/3bb3/8/8/3k4/K4Q2 


w 


3 


17 


13 


3 


13 


SZ‘x 


BNN-Q 1 


0-1 


7N/6q l/8/8/2N5/3Klk2/8/B7 


b 


1 


125 


7 


1 


71 lS(C/M/Z)o x 


QNN-Q 1 


0-1 


8/2N5/8/2q5/5N2/2k5/8/2K4Q 


b 


5 


9 


7 


5 


7 |S(C/Z)g x; SM ok 



Table 7. Example Positions. 7 



7 Without a DTR EGT, it is not always possible to determine dtr precisely. 
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Abstract In this article a neural network architecture is presented that is able to build a soft 

segmentation of a two-dimensional input. This network architecture is applied to 
position evaluation in the game of Go. It is trained using self-play and temporal 
difference learning combined with a rich two-dimensional reinforcement signal. 
Two experiments are performed, one using the raw board position as input, the 
other one doing some simple preprocessing of the board. The second network is 
able to achieve playing strength comparable to a 13-kyu Go program. 

Keywords: Go, neural networks, segmentation, connectivity, NeuroGo 

1. Evaluating Go Positions 

Writing a program that plays the game of Go is a notoriously hard problem. 
Despite many efforts the best programs still play at a weak to medium amateur 
level of about 8 kyu (Schaeffer, 200 1 ). This is not only due to the large branching 
factor but also to the fact that the evaluation of Go positions is difficult. State- 
of-the-art programs rely on a knowledge intensive approach. They use large 
databases of patterns, rule-based systems, and hand-tuned heuristics (Bouzy 
and Cazenave, 2001). 

1.1 Simplification by Segmentation 

The evaluation of a Go position can be simplified by segmenting the position 
into parts. This works well in positions with independent subgames where 
playing one subgame does not affect the value of other subgames. A typical 
example are Go endgame positions to which combinatorial game theory has 
been applied successfully (Muller and Gasser, 1996). Positions without a clear 
segmentation are more difficult: in particular, middle-game positions with many 
possible continuations each leading to different follow-up segmentations, and 
positions with multiple nearby tactical fights. Many Go programs use some 
influence-based segmentation of positions (Chen, 2002). 
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Cognitive studies on human Go players have shown that humans perceive Go 
positions not as a set of hierarchical structured patterns with clear boundaries 
but rather as a set of overlapping clusters (Reitman, 1976). 

1.2 Neural Networks 

Neural networks have been used for evaluating full-board Go positions. 
Schraudolph, Dayan, and Sejnowski (1994) used temporal-difference learning 
(Sutton, 1988) to train a neural network to evaluate Go positions. They showed 
that it is important to use a rich reinforcement signal and a sparsely connected 
network architecture that reflects the local character and translational invariance 
of the pattern-recognition task. This was accomplished by using 5x5 receptive 
fields with weight sharing. 

However, essential features of a Go position depend on whether two points 
on the board are connected by one colour or will become connected later. Using 
fixed-size receptive fields makes the recognition of long distance connections 
impossible. The most basic cases are blocks. Blocks are sets of adjacent stones 
of the same colour; they can take an arbitrary shape on the board. Moreover, 
they can only be captured as a unit. 

The Go program NeuroGo (version 2) used receptive fields that dynam- 
ically adapt their size to fit around blocks (Enzenberger, 1996). This was 
achieved by transforming the Go position into a graph with all stones of a block 
merged into a single node. While NeuroGo ’s performance was improved 
greatly compared to a network using fixed-size receptive fields, it was still im- 
possible for the network to represent higher-level objects like groups. Groups 
are a set of loosely connected blocks that might become connected later. This 
was the motivation for the development of a new network architecture with 
better abilities for segmenting the board. 

2. Architecture using Soft Segmentation 

This section presents a neural-network architecture that is able to process a 
Go position by building a soft segmentation of the position. This architecture 
is now used in version 3 of NeuroGo. 

The neural network uses a feedforward backpropagation architecture. The 
neurons have a sigmoid activation function, with activation values between 0 
and 1 and a bias weight. The soft segmentation of a position is represented as 
two connectivity maps, one for each colour. Each connectivity map assigns a 
connectivity strength between 0 and 1 to each pair of points on the board. See 
Figure 1 for an overview of the network architecture. 

The next section describes the reinforcement signal that is used for learning, 
followed by a description of the layers in the network and the connections 
between them. 
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Figure 1. Network architecture. 

2.1 Reinforcement Signal 

The final position of a Go game contains much richer information than merely 
the global score. The network uses single-point eyes, connections, and live 
points which are defined as follows. 

■ A single-point eye is an empty point with all adjacent points occupied by 
stones of the same block or by stones of two blocks of the same colour 
that share another single-point eye. 

■ A pair of points is connected by one colour if there is a path between 
them containing only stones of that colour or single-point eyes. 

■ A point is said to be alive if it is connected to two single-point eyes. 

Chinese scoring rules are used during the network training. No pass move is 
allowed until all points on the board are alive 1 . Also, it is not allowed to play 
in one’s own single-point eyes. 

Single-point eyes and connections can occur in earlier positions of the game, 
but may not exist in the end position, because those blocks could have been 
captured. Live points stay alive from the first position in which they occur until 
the end position. 

The network uses single-point eyes, connections and live points as a rein- 
forcement signal. Connections are used only locally within a 3 x 3 window 
centred around each point. 



'This makes scoring and detection of the end of the game easy, but will lead to wrong play in case of seki 
situations or more complicated single-point eyes (involving more than two blocks). However these cases 
rarely occur in actual games. 
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2.2 Neuron Layers 

Each layer of neurons in the architecture contains one or more neurons for 

each point on the board. There are 7 layers as follows. 

Input layer: This layer contains one or more neurons per point depending on 
the number of (boolean) input features that are used. The activation of 
the neurons is set to 0 or 1 according to whether a certain input feature 
is present in the Go position at this point. A Go position is always 
transformed such that Black is to move. This makes an additional input 
for indicating what colour is to move unnecessary. 

First hidden layer: This layer contains one or more neurons per point. The 
number of neurons per point is a parameter of the network architecture. 
The layer is connected with receptive fields to the input layer. 

Second hidden layer: Like the first hidden layer, this layer contains one or 
more neurons per point. The number of neurons per point is another pa- 
rameter of the network architecture. The layer is connected with receptive 
fields to the first hidden layer. 

Simple eyes layer: This layer contains 2 neurons per point, one for each colour. 
The activation is a prediction of whether that colour is able to create a 
single-point eye at this point. The layer is connected with receptive fields 
to the first hidden layer. It receives a reinforcement signal when a simple 
eye is created on the board. 

Local connections layer: This layer contains 18 neurons per point, 9 for each 
colour. The activation is a prediction of whether that colour is able to 
create a connection from this point to each of the 9 points in a 3 x 3 window 
around this point (including self-connection). Neurons corresponding 
to off-board points are unused. The layer is connected with receptive 
fields to the first hidden layer. It receives a reinforcement signal when a 
connection is created on the board. 

Global connectivity layer: This layer contains 2 • n 2 neurons per point for 
board size n. The activation is a prediction whether each colour is able 
to create a connection from this point to any point on the board. The 
activation is computed by the connectivity pathfinder (see 2.5) from the 
local connections layer. 

Evaluation layer: This layer contains 1 neuron per point. The activation is 
a prediction whether this point will be alive for Black (activation 1) or 
White (activation 0). The layer is connected to the second hidden layer 
and the simple eyes layer by connectivity-based weight selection (see 
2.6). It receives a reinforcement signal for live points when they are 
created on the board. 
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2.3 Point Types 

Each neuron corresponds to a point. Different point types are defined. The 
actual weights are chosen from weight sets depending on the point type. 

There are two reasons for using weight sets . They increase the number of free 
parameters without significantly affecting the time for processing a position, 
since only one weight of a set is selected. They also compensate for effects of 
the edge of the board while still making it possible to learn local patterns that 
are mostly invariant with respect to translation. 

The function type(p) assigns a type to each point p. See Figure 2 for the 
point types that were used. 



type(p) 

0 Empty comer point 

1 Empty edge point next to comer 

2 Other empty edge point 

3 Empty point diagonal from comer 

4 Other empty point on second line 

5 Empty point on line 3 or higher 

6 Black stone 

7 White stone 



Figure 2. Point types: definition and example. 

2.4 Receptive Fields 

The function window ( p ) assigns to each point p the set of points within a 3 x 3 
square window centred at this point. If a layer is connected with receptive fields 
to a previous layer then each neuron corresponding to a point p is connected to 
all neurons in the previous layer corresponding to the points p' € window (p). 
The spatial relationship of two points p andp' is described by a field index given 
by the function field (p.j/) (see Figure 3). 

Consider a layer L with n neurons per point connected to a previous layer 
U with m neurons per point by receptive fields. Then a neuron corresponding 
to a point p and index i G {l..n} is connected to all neurons in the previous 
layer corresponding to points p' G window (p) and index j 6 using the 

weights 

i„ LL ' 

*J.type(p) ,type(p' ) ,field(py ) 

The neuron has a bias weight frf type (p)- 



ABCDEFGHJ 

9 0-1-2-2-2-2-2-1-0 9 

i i ^ A. I I I I 

8 1-3-4-{7X6>-4-4-3-1 8 

7 2-4~5<7}5<6H^4-2 7 

6 2-4-5-5-5{7MM-2 6 
I I ; Ak I XZ I I 
5 2-4-5{7>5{7)(6>-4-2 5 
I i I V I YT I I 
4 2-4-5-5-5-5~5-4-2 4 

1 I I I .^k I JL.I 1 

3 2-4-5-5<7>5-©-4-2 3 
1 l I I )< JL V I I 
2 1 ~3-4~4-(7)©-4“3-1 2 
I I AXT I I I 

1 0-1-2<7)®-2-2-1-0 1 
ABCDEFGHJ 
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ABCDEFGHJ 




9 

8 

7 

6 

5 

4 

3 

2 

1 



Figure 3. Receptive fields. Field indices for 2 receptive fields centred at A1 and G7. 



2.5 Connectivity Pathfinder 

The connectivity pathfinder creates a global connectivity map from the local 
connections layer. It assigns a connection value between 0 and 1 to each pair 
of points for each colour. 

Local connections are assumed to be independent. Connection values of 
points outside the local connection window are computed as the product of the 
local connection values. Only the path resulting in the highest connection value 
is considered. The current implementation of the pathfinder runs Dijkstra’s 
shortest-path algorithm with each point as a starting point. 

2.6 Connectivity-based Weight Selection 

The simple eyes layer and second hidden layer are connected to the evaluation 
layer using connectivity-based weight selection. 

Every neuron in the evaluation layer is connected to all neurons in the pre- 
vious layer with weights depending on the connection value between the cor- 
responding points predicted by the global connectivity layer. For that purpose 
connection values are transformed from the continuous values between 0 and 
1 into 8 equally sized intervals 2 . For each colour c and pair of points p and p r 
the function connection (c,p,p') G {1...8} returns the index of the interval. 

Consider a neuron corresponding to a point p in the evaluation layer E. The 
neuron has a bias weight . Let L be one of the previous layers to which 

the evaluation layer is connected by connectivity-based weight selection (the 
simple eyes layer or second hidden layer), with n neurons per point. Then the 



2 For efficiency, points with a connection value smaller than 0. 1 were ignored. 
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neuron is connected to all neurons in the previous layer corresponding to points 
p' and index i € {l..n} using the weights 

...EL 

w i,type(p ) , type (p'), connection (c,p,p') 



for both colours c. 

3. Learning 

The learning is described according to the usual distinction between training 
(subsection 3.1) and testing (subsection 3.2). 

3.1 Training 

Games for training are produced by self-play. A move is selected by using 1- 
ply look-ahead with the sum of all outputs in the evaluation layer as the scoring 
function. 

Although training on larger board sizes provides more reinforcement sig- 
nal for each position, the 1-ply look-ahead would slow it down considerably. 
Therefore the experiments were done on a 9 x 9 board. However, the network 
architecture allows retraining the network on increasing board sizes to adapt it 
to the different ratios between edge and centre points. 

For better exploration of the state space, in 15% of the moves, instead of play- 
ing the move with the highest score, Gibbs sampling (Geman and Geman, 1984) 
over the move scores was used. The (unnormalised) probability of selecting a 
move with score s was 



P(s) = exp (s/T) 



with a temperature T of 4.0. These positions were not trained. 

After each played game, the 1 0 most recent games were trained using temporal- 
difference learning with A = 0 (Sutton, 1988). The games were trained in 
random order going backward from the end position with immediate update of 
the weights after each position. The reason for the small value of A is that most 
parts of the network see only a portion of the board, so that the effective length 
of the game is not the number of moves in the global game, but is the number 
of moves in a part of the board. 

The weights were updated by backpropagation. All neurons in layers that re- 
ceive a reinforcement signal by the temporal difference algorithm were treated 
as output neurons in the backpropagation algorithm. The algorithmically com- 
puted connections to and from the global connectivity layer did not take part in 
the backpropagation algorithm. 
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3.2 Testing 

After the first 100 games and every 5,000 games thereafter, the performance 
was tested by playing 100 games on a 9 x 9 board against the program GnuGo 
version 3.0.0, released in 2001 (GnuGo, 2001). On the NNGS Go Server, the 
rating of GnuGo in 2001 was about 13 kyu (NNGS, 2001). 

To obtain a variety of different games, every move of the network was selected 
by Gibbs sampling over the score with a temperature of 0.33. GnuGo always 
played White, the komi was 5.5. Identical games or games that could be mapped 
to other games by rotation and mirroring were sorted out. 

The error of the mean value of the average score and percentage of wins is 
given by the standard deviation of the values divided by the square root of the 
number of games. However, this does not take into account partial correlations 
between the games. To get a more robust estimation of the error it is helpful 
to look at the deviation of the values late in the training process. At this time 
the changes in the weights of network are small, so that no big change in the 
playing strength is expected. From the reproducibility of the values between 
slightly different networks the error of the average score is estimated to be ±5 
points and the error of the percentage of wins ±10%. 

4. Experiments 

The description of the experiments consists of two parts ; the setup (subsection 
4.1) and the results (subsection 4.2). 

4.1 Setup 

The size of the network was chosen to be 8 neurons per point in the first 
hidden layer and 2 neurons per point in the second hidden layer. The learning 
rate for the weight update was 3 • 10 -4 . The performance of the network was 
compared using two kinds of input. 

Raw board: Only 1 neuron per point was used in the input layer with constant 
activation 1 . This corresponds to providing the network only with the raw 
Go position as input, because the location of the stones is already used 
implicitly in the selection of the weights from the weight sets according 
to the point types. 

Preprocessed board: The position was preprocessed and some local features 
of the position were used as input for the network. Only simple features 
that can be computed quickly and non-expensive tactical searches were 
used. The features included: number of stones and liberties of blocks, 
a weighted sum of higher-grade liberties (PON-estimation for blocks 
without any concept of groups (Tajima and Sanechika, 1998)) and the 
results of simple tactical searches (ladders (Sensei, 2003)). Also, basic 
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link patterns (straight 2 and 3 point jump, knight jump, long knight jump) 
were detected. See Table 1 for a detailed listing of the inputs. 



Input for empty points 

0...5 Black has 0, 1,2, 3, 4, >4 liberties if playing here 

6.. .1 1 White has 0, 1,2, 3, 4, >4 liberties if playing here 

12 Black can be captured in a ladder if playing here 

13 White can be captured in a ladder if playing here 

14 Single-point eye for Black 

15 1 move necessary for single-point eye for Black 

16 2 moves necessary for single-point eye for Black 

17 >2 moves necessary for single-point eye for Black 

18 Ponnuki shape for Black (Sensei, 2003) 

19 1 move necessary for single-point eye for White 

20 2 moves necessary for single-point eye for White 

21 >2 moves necessary for single-point eye for White 

22 Ponnuki shape for White 

23 Move by Black here puts some white block in atari 

24 Point is part of link pattern for Black 

25 Point is part of link pattern for White 

Input for occupied points 

0...7 PON is <-1.5, -0.5, 0.5, 1.5, 2.5, 3.5, 4.5, >4.5 (Tajima and Sanechika, 1998) 

8 Block can be captured in a ladder if opponent moves first 

9 Block can be captured in a ladder if its colour moves first 

1 0. . . 1 3 Number of liberties of block is 1 , 2, 3 , > 3 

14.. . 18 Number of stones of block is 1, 2, 3, 4, >4 



Table 1. Preprocessed input. 



4.2 Results 

The training took several weeks of CPU time on an Athlon XP 1800. Figure 
4 shows the results of the test games against GnuGo. The network using the 
raw board input achieves an average score of about -25 points after 40,000 
games. The network using the preprocessed input achieves an average score of 
about -5 points after 10,000 games. 

Figures 5 and 6 show an example position with the evaluation output and 
the connectivity map for a point of the network using the preprocessed input. 
The network considers the left white group to be safe (0.2 is equivalent to 
80% probability to become alive) but the centre group at F4 is unsafe (40% 
probability to become alive). The reason can be seen in the connectivity map 
for F4 in Figure 6: The probability for White to connect F4 to B4 is only 40%. 




Percentage of wins Average score 
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Games played 




Games played 



Figure 4. Average score and wins against G nu G o . The error of the average score is estimated 

to be ±5 points and the error of the percentage of wins ±10% (see subsection 3.2). 
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Figure 5. Example position (last move 
White E2). The numbers show the evalua- 
tion output of the network using preprocessed 
input. 



Figure 6. Example position (last move 
White E2). The numbers show the connec- 
tivity map of the network using preprocessed 
input for White from the point F4. 



A complete game of the network versus GnuGo is shown in Figure 7. 
GnuGo played Black in this game. The game was played with the network 
using preprocessed input after the training was finished. The network does a 
good job in keeping the black stones separated (with one mistake at move White 
36) and wins by 8.5 points. 



ABCDEFGHJ 




ABCDEFGHJ 




ABCDEFGHJ 



Figure 7. Example game of the network using preprocessed input versus GnuGo (here 
playing Black). White wins by 8.5 points. 
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5. Conclusion 

It was shown that the presented neural-network architecture can be success- 
fully used for evaluating Go positions. Considering that the best Go programs 
currently play at a level around 8 kyu, the good performance against a 13 kyu 
program is promising. In particular, the approach addresses a weakness that 
current Go programs have in handling complicated tactical situations with many 
nearby weak groups. However, it is clear that a static evaluation cannot handle 
all kinds of positions. Thus, it will be necessary to add more local tactical 
search results to the input, and/or use the network as an evaluation function in 
a global search. 

The most current version of NeuroGo uses the described architecture with 
more neurons in the hidden layers and more sophisticated input features. This 
increases the average score against GnuGo 3.0.0 to about +2 points and the 
percentage of wins to about 50%. 
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Abstract A new classification for eye shapes is proposed. It allows to decide statically 
the status of the eye in some restricted conditions. The life property enables to 
decide when one eye shape is alive regardless the number of opponent stones 
inside. The method is easy to program and can replace a possibly deep search 
tree with a fast, reliable and static evaluation. 

Keywords: computer Go, eye, neighbour classification, life property 

1. Introduction 

It is well known for both. Go players and Go programmers, that when a 
string has two eyes it is alive. Though sufficient, it is not a necessary condition. 
Sometimes one big eye is sufficient to live, either it is possible to make two 
eyes at any moment, or it is alive in seki. 

This paper deals with the classification of large eyes and when one big eye is 
sufficient to live. Here we propose an algorithm that gives statically an answer 
to that question. It is easy to program and very fast. We present the neighbour 
classification, a completely new concept that enables to group eye shapes with 
common interesting properties. We also introduce the concept of life property 
that permits to decide when one eye shape is alive regardless the number of 
opponent stones inside. This property relies only on the shape of the eye and, 
when applicable, is very powerful. It is a completely safe tool as no heuristics 
are involved. It can be applied to a wide variety of situations. 
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Section 2 describes the existing work related to handling eyes and life and 
death. Section 3 sets accurate definitions of the concepts used throughout this 
paper. New concepts like end point or life property are proposed. We also 
have enlarged Muller’s (1999) concept of plain eye to cover statically more 
cases. Section 4 describes the main contribution of this paper, the neighbour 
classification and the theorem of the neighbour classification. Section 5 shows 
how to use the neighbour classification to identify the vital and end points for 
centre eyes. Section 6 discusses the limitations of the theorem for side and 
comer eyes and proposes possible ways to overcome them. Section 7 reports 
the application of this new theory to semeai problems. Finally, we suggest 
that the reader has a quick look at the first two paragraphs of Section 4 before 
reading Section 3 so that the captions in the figures of this section are clarifying 
instead of confusing. 

2. Previous Work 

Several approaches have been made to life and death and eye characterisation 
with great success. 

Landman (1996) applies combinatorial game theory to determine a value for 
a given eye space. Fotland (2002) describes the way his program, The Many 
Faces of Go, analyzes eyes. He represents eye shapes as its game tree with 
four different values; the upper and lower bounds on the number of eyes, and 
two intermediate values aiming to include the effects of ko and uncertainty. 
This work deals mainly with a big variety of general eyes. He combines static 
analysis with a small search. 

Chen and Chen ( 1 999) show a method to evaluate heuristically life for general 
classes of groups. Muller (1997) extends Benson’s algorithm describing safety 
of blocks under alternating play. 

Big eyes are of great importance in a wide variety of semeai problems. 
Though not being the key to the most common life-and-death problems, when 
they appear it is fundamental to handle them in a proper way. Most of the 
existing techniques treat them in an unsatisfactory way; either they treat them 
heuristically so unexpected situations may appear driving to a wrong answer, 
or they just let the search algorithm continue until they become a small eye 
with the subsequent inefficiency problems. Here we propose a theory and an 
algorithm to deal statically with this problem. It is very fast, easy to program, 
free of heuristic considerations and therefore completely reliable. It can replace 
completely a possibly deep search tree in a wide number of situations and it can 
be of great interest to enhance the existing techniques and to reduce the degree 
of inaccuracy. 
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3. Definitions 

Eye. In this paper an eye 1 will be an area completely surrounded by one 
block. Opponent and own stones will be allowed in the eye region and also 
empty points not adjacent to the surrounding block. This is a generalization of 
Muller’s (1999) definition of plain eyes. We will classify eyes according to its 
position on the board. 



Comer Eye — The eye contains a comer point 
two neighbours (cf. Figure 1). 



and its 




Figure 1. A comer 
[1122] eye, not plain. 



Side Eye — The eye is not a comer eye and contains at 
least three side points (note: a comer point is a particular 
case of side point) (cf. Figure 2). 

Centre Eye — All the eyes that are not comer or side eyes 
(note: the most part of big eye shapes can only be centre 
eyes (Math world, 2003)). 




Figure 2. A side 
[112234] eye, plain. 




Figure 3. A centre [1122233] eye, plain (left); a centre [112224] eye, plain (middle), and a 
centre [112224] eye, not plain (right). 



Eye Shape. This is the set of intersections of the eye. The intersections can 
be empty, or occupied by opponent or friendly stones. We will use the term 
Nakade Shape to refer to a set of intersections that, in case of being an Eye 
Shape, would have one or zero vital points. 

Eye Status. We will define four possible status for a centre eye: Nakade , 
Unsettled ' Alive, and AlivelnAtari. 



Un the existing literature eye is used to refer to a small one-point eye, while bigger eyes are referred to as 
X-enclosed region (Benson, 1976) or Big eye (Fotland, 2002). In this paper we mainly deal with big eyes, 
therefore as no confusion is possible we will keep the term eye as we define it. 
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Nakade — the eye will end up as only one eye and this will not be sufficient to 
live. A nakade eye can be the result of: (1) an eye with an empty set of vital 
points (cf. Figure 4b) or (2) an eye with all the set of vital points filled by the 
opponent’s stones (cf. Figure 4a). 




Figure 4a. A nakade status for a Figure 4b. A nakade status for a [2222] 

[112233]-q eye. The two vital points are eye. It has an empty set of vital points. 

filled by the opponent. 



Unsettled — the eye can end up as a nakade eye or an 
alive eye depending on the colour to play. An unsettled 
eye is the result of an eye with one and only one empty 
intersection in the set of vital points (cf. Figure 5). An 
unsettled status is what Landman (1996) defines as l^e. 




Alive — the string owning the eye is alive no matter who 
plays first and no matter what the surrounding conditions 
are. An alive eye can be the result of: (1) an eye with 
two or more empty intersections in the set of vital points 
(cf. Figure 6a) or (2) the eye is a n-shape that cannot 
be filled by the opponent with a (n — l ) -nakade shape 
(Figure 6b). We will make no distinction between being alive or being alive in 
seki like in Figure 6b, as in many cases being alive in seki may be almost as 
good as living with two eyes (Landman, 1996). 



Figure 5. An 

unsettled status for a 
[1222234] eye. One of 
the two vital points is 
empty (1). 




Figure 6a. Alive status for a [1 1 12234]- 
p eye. Even though these shape can be filled 
with a rabitty six, A and B belong to the set 
of vital points so we have a miai of life. 



Figure 6b. Alive status for a [ 1 1 222] eye. 
No matter how many stones plays White 
inside, Black is unconditionally alive. The 
opponent cannot fill the eye space with a 
nakade shape of size four. 
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AlivelnAtari — this is a particular case in which the surrounding conditions 
determine the status of the eye. We say that an eye has an AlivelnAtari status 
if there are only one or zero empty intersections adjacent to the surrounding 
block but capturing the opponent stones inside the eye grants an alive status. 
Only when the external liberties of the string owning the eye are played it is 
necessary to capture the stones inside the eye (cf. Figure 7a and 7b). 




Figure 7a. AlivelnAtari status for a 
[111223] eye. Capture grants life. 



Figure 7b. AlivelnAtari status for a 
[222233] eye. When A is played the sta- 
tus changes to Unsettled, and if both A and 
B are played the status is Nakade. However 
capturing the stones inside the eye grants an 
alive status. 



Vital Points. A minimal set (one or more) of intersections inside the eye that 
should be filled by the opponent to grant a nakade status for the eye (cf. Figure 
8a and 8b). 

End Points. A minimal set (one or more) of intersections inside the eye that 
should not be filled by the opponent until the end to grant a nakade status for 
the eye in the process of killing the string (cf. Figure 8a and 8b). 

This should not be confused with Fotland’s (2002) number of ends. While 
Fotland’s concept deals with the shape, our concept deals with the order in 
which the intersections of the eye should be filled by the opponent. In Figure 
8a there is one end point but three Fotland’s ends. However, in most cases we 
see that an end point from this paper’s point of view is also a Fotland’s end. 

Life Property. We will say that an eye shape has the life property if the 
only possible status for this shape are Alive or AlivelnAtari. Thus when an eye 
shape has the life property we only need to check whether the stones inside the 
eye should be captured due to an AlivelnAtari status. 

For example, a 3-shape in a line can have a Nakade, Unsettled, or AlivelnAtari 
status depending on the opponent stones played inside. This 3-shape does not 
have the life property since a Nakade and Unsettled status are possible. In 
contrast, the [11222] shape showed in Figure 6b can only have an Alive status 




114 



R. Vila, T. Cazenave 




Figure 8a. Vital (A) and end (□) points 
for a [11123] eye. 



Figure 8b. Vital (A) and end (□) point 
for a [1122224] eye. 



or an AlivelnAtari status (when four out of the five intersections are played by 
the opponent), this shape has the life property. Therefore, while all the shapes 
having the life property are alive, not all the shapes having an alive status have 
the life property. 

The life property should be regarded as a property slightly below Benson’s 
(1976) unconditional life, because if we have an AlivelnAtari status it might be 
necessary to play inside the eye, but with the great advantage that detecting it 
is just a matter of counting neighbours as it will be shown in Section 4. 

4. Neighbour Classification 

Let Si be the set of all possible eye shapes of size i. Note that fori = 1.. 6 there 
is an isomorphism between Si and Vi being Vi the set of free i-polyominoes 
(Mathworld, 2003). An n-polyomino (or “n-omino”) is defined as a collection 
of n squares of equal size arranged with coincident sides. Free polyominoes can 
be picked up and flipped, so mirror image pieces are considered identical. For 
size seven we should discard the holed-polyomino to keep the isomorphism. 

Lete € Si, we define the Neighbour Classification of e, NC(e), as anumber 
of i digits sorted from low to high; every intersection in the eye space is associ- 
ated to a digit that indicates the number of neighbours (adjacent intersections) 
to that intersection that belong to the eye space (cf. Figure 9). 






1 




2 


4 


1 


2 


2 





Figure 9. For the rabbity six NC{e) = 112224. 
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Let ~ be the following equivalence relation: let e\ , E £% then 
ei ~ e 2 <==> NC{e\) = NC(e f) 

Thus ~ gives a partition of £i defined by the equivalence classes in £i / ~ (cf. 
Appendix B). 

Example. Given £5 we can find four different neighbour classifications for 
its elements (Note that \\£^\\ = \\Vs\\ = 12) (Mathworld, 2003). 

NC(e) G {11222, 11123, 11114, 12223}, Ve G £5 

Theorem 1 (of the neighbour classification) Let e be a centre eye, 
e G Si and [e] G Si j ~ the equivalence class of e, for i — 1..7 if e has the 
life property then V/ G [e\, fhas the life property inversely if e has not the life 
property then V/ G [e], fhas not the life property. 

Proof: For i G {1, 2, 3, 4} there is no eye shape that has the life property 
so the theorem is correct. The 1-shapes and 2-shapes are always nakade, the 
two existing 3-shapes have one vital point so their status can be nakade or 
unsettled (depending on the fact whether the opponent has or has not played 
the vital point). There are five 4-shapes with zero, one or two vital points. All 
of them can have a nakade status if the opponent plays all the vital points. The 
interesting point comes with higher size shapes. 

Under the conditions of the theorem, having the life property is just a matter 
of shape. If and only if an z-shape cannot be filled by the opponent with an 
(i — l)-shape that has one or zero vital points ( Nakade Shape), then this z-shape 
has the life property. 

Ko cannot arrive in the centre for eye shapes of size below seven. For size 
seven there are only two shapes that can have a ko status in the centre (cf. Figure 
10). These shapes do not have the life property as they can be filled by a rabitty 
six so the ko does not interfere with our theorem. 




Figure 10. Shapes in classes [1222234] and [1122224] can have a ko status in the centre. 



5-shapes. There are two nakade shapes of size 4 (the square and the pyramid) 

so all the 5-shapes that do not contain a square or a pyramid will have the life 
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property. All the shapes belonging to [11222] have the life property while the 
others do not. The only nakade shapes of size five are the bulky five and the 
star. 

6- shapes. As before, all the 6-shapes that do not contain a bulky five or 

a star have the life property. These are the 26 shapes belonging to classes 
{[112222], [111223], [111133]}. The only nakadeshape of size six is the rabbity 
six. 

7- shapes. Now we only have to care about those shapes containing a rabbity 

six, there are five shapes distributed in four equivalence classes. All the other 
7-shapes have the life property. There is no nakade shape of size seven. This 
concludes the proof of Theorem l.D 

The exhaustive classification for all the eye shapes under size eight is sum- 
marized in Table 1. 



Si 


£i/~ 


M 


Life Property 


£1 




1 






[0] 


1 


No 


£2 




1 






[11] 


1 


No 


£3 




2 






[121] 


2 


No 


£4 




5 






[1122] 


3 


No 




[1113] 


1 


No 




[2222] 


1 


No 


£5 




12 






[11222] 


7 


Yes 




[11123] 


3 


No 




[11114] 


1 


No 




[12223] 


1 


No 



Si Si /~ I'* | Life Property 



S 6 35 



[112222] 


13 


Yes 


[111223] 


12 


Yes 


[111133] 


1 


Yes 


[112233] 


4 


No 


[122223] 


2 


No 


[112224] 


1 


No 


[111124] 


1 


No 


[222233] 


1 


No 


£7 


107 




[1122222] 


30 


Yes 


[1112223] 


40 


Yes 


[1122233] 


11 


Yes 


[1111233] 


8 


Yes 


[1222223] 


5 


Yes 


[1111224] 


4 


Yes 


[1112333] 


2 


Yes 


[1222333] 


2 


Yes 


[1112234] 


2 


No 


[1222234] 


1 


No 


[1122224] 


1 


No 


[2222224] 


1 


No 



Table 1. Neighbour Classification for i = 1..7. 
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The strength of the theorem lies in the fact that the life property only depends 
on the shape. So given an eye shape we have just to find its neighbour classifi- 
cation. If the class has the life property, whatever number of opponent stones 
inside, we know that the group owning the eye is alive, we only need to check if 
it is necessary to capture the stones inside the eye due to an AlivelnAtari status. 
If the class does not have the life property a further study is required to decide 
the status (cf. Section 5). 

We cannot extend the theorem for higher sizes in the centre because ko’s 
and opponent eyes may appear. But we will see that usually it is not much of 
a problem as the life property is an excessively strong condition for such that 
eyes. 



5. Vital Points and End Points Identification 



Another interesting property of the neighbour classification is that it allows, 
for centre eyes, to find the vital and end points for a given eye shape just looking 
at its signature. Below we will show the identification for the five classes of size 
six without the life property. The identification for eye shapes with sizes from 
one to five is easy to find out and size seven requires an analogue procedure as 
size six. 



For size six we have five different classes without the 
fife property and thus, the status should be checked. 

[112224] — The rabbity six is the only nakade shape of 
size six. The vital point is the 4-neighbour point. The 
2-neighbour point not neighbouring the vital point may 
be considered an end point (cf. Figure 11). Though not 
necessary to be filled at the end, only if filled we should 
test for a non nakade shape inside. 

[[Ill 124] — Vital points are {2,4}-neighbour points and 
the end point is the 1 -neighbour point neighbouring the 
2-neighbour point (cf. Figure 12). 

[[222233] — Vital points are the two 3-neighbour points 
(cf. Figure 13). There is no efficient way to define end 
points. So we should always test for a nakade four zigzag 
inside. 

[[112233] — We need to create two subclasses in this 
class to perform the identification. We define class 
[112233]-Q! as the subset of two elements in class 
[112233] in which {3,3} are neighbours and the class 




Figure 11. Vital and 
end points for [ 1 1 2224] . 




Figure 12. Vital and 
end points for [ 1 1 1 1 24] . 




Figure 13. Vital and 
end points for [222233] . 
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[1 12233]-/? as the subset in which {3,3} are not neighbours. So for a elements 
are two vital points corresponding to the {3,3}-neighbour points and two end 
points corresponding to the {l,l}-neighbour points. For /? elements the end 
points are the same, but all the other intersections are vital points (cf. Figure 
14 and 15). 




[[1 22223] — The end point is the only 
1 -neighbour point. For vital points 
we need to consider the 3-neighbour 
point and its three neighbours (cf. 

Figure 16). 

We do not know a unique way to 
find the vital and end points for no 
matter what kind of shape. So far a case by case implementation is needed, but 
in the process, the neighbour classification efficiently helps to determine them 
for each given class. 

Once the identification is done it is possible to give the status and the hot 
point to play inside the eye, if necessary, depending on the opponent and friendly 
stones played in the eye shape (cf. Appendix A). 




6. Corner and Side Eyes 

To approach comer and side we should first remark the following implication 
(NoLP = No Life Property ): 

NoLP Centre =>■ NoLP Side => NoLP Corner 

Thus, once the study for centre eyes is done only classes with the life property 
in the centre need to be checked in the border and the comer. 

For side eyes, theorem 1 continues to be true for sizes from one to four. For 
sizes five and six, ko only appears in classes that do not have the life property 
in the centre so the theorem continues to be tme. For size seven there are two 
classes ([1222333] and [1 1 12333]) that have the life property in the centre but 
fail to have it in the side due to ko situations. Unfortunately class [1122233] 
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has seven out of eleven members that do not have the life property due to ko 
while the other four continues to have it in the side (cf. Figure 17a and 17b). 
So the theorem is no longer applicable for side eyes with size seven. 




Figure 17a. This element of class Figure 17b. This element of class 

[1 122233] has the life property in the side. [1 122233] has not the life property in the 

side due to ko status. 

Even though the theorem fails for side eyes, 
there is still a lot of knowledge that can be used 
for an implementation to solve side eyes. We will 
only have to consider more special cases. Shapes 
that do not have the life property will need to be 
treated more carefully in the side. For example, 
we have seen that a [1 \2233]-a shape has two vi- 
tal points. Therefore, if no vital point was played 
by the opponent, in the centre we had an alive sta- 
tus. This is no longer true in the side as Figure 18 
shows. What might be called an Unsettled-Ko status appears for side eyes. 

In the comer the situation is worse. Bent four in the comer, ko’s and the 
possibility for the opponent to make easily an eye inside the big eye makes the 
comer a difficult battleground to apply the theorem. 

It is the moment to remark now 
how strong the condition of having 
the life property is. Strange exam- 
ples of eye shapes in the comer can be 
found. They do not have the life prop- 
erty (ko’s can arrive) but they are al- 
most impossible to kill in a real game 
(cf. Figure 19). 

However, the fact that the theorem is not applicable in the comer does not 
mean that the neighbour classification is useless in those cases. It can be used 
to classify shapes in a straightforward way. For example, for size six there are 
only 12 out of 35 shapes that can be comer eyes. Six of these 12 are shapes of 
class [112222] and [111223]. These shapes can have their status easily decided 
depending on the opponent stones played inside. For the other six shapes we 
can just return an unknown status and let the search continue until they become 




Figure 19. A 12-size comer eye without the 
Life property. If White plays □ a ko status ap- 
pear. 




Figure 18. If White plays A 
a ko will appear. If black wins 
the ko the status will be alive, if 
loses will be nakade. 
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a size five comer shape which are not so hard to decide by means of a case by 
case implementation. 

7. Application to Semeai Problems 

The neighbour classification has been successfully used and tested in semeai 
problems. Following Muller’s (1999) classification of semeais and using the 
neighbour classification we have been able to solve statically classes from 0 
to 2, but also all the semeais with centre and side eyes which are over class 
2, either because the eye is not plain or because there are more than one non 
essential block inside the eye. This signifies an improvement over the results 
achieved statically and reported by Muller (1999). 

A representative subset of semeai problems solved using the neighbour clas- 
sification can be found at www.ai.univ-pauris8.fr/~ritx/semeai.zip. 



8. Conclusions 

Three new ideas about eyes are presented in this paper: the concept of end 
point, the definition of life property, and the neighbour classification. 

The neighbour classification and the life property perform a completely safe 
tool for deciding eye status statically under some restricted conditions. The 
method is easy to program and can, in many situations, replace a possibly deep 
search tree with a fast, reliable and static evaluation. 

For eye shapes that do not have the initial conditions, like side and comer 
eyes, we have shown that still a great deal of useful knowledge coming from 
the neighbour classification can be used. 

It has been tested for semeai problems and proved to be a powerful tool. 
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Appendix A: Implementation for 6-shapes centre eyes 

Below we present the general guidelines for an implementation of an algo- 
rithm to decide the status for size six centre eyes. We suppressed irrelevant 
details. Eye and Rzone should be regarded as classes that allow to store a set 
of intersections on the board. The names of the variables have been chosen to 
allow reading the implementation as if it were pseudo-code. 

FindShape Vi talEnd takes e as input, decides the shape using the neigh- 
bour classification and initializes vital and end variables using the explana- 
tions already given in Section 5. 

InitLocals takes e, vital and end as input and initializes 
EyeFilledSpace, vit alFilled and endFilled. The “Filled” variables con- 
tain the intersections in the eye, the vital zone and the end zone that are filled 
with opponent stones. 

typedef enum eye6_t { tll2224, tllll24, t222233, tl22223, tll2233a, 

t 112233b, other6 }; 

void Size6_Centre( Eye &e ) { 
eye6_t shape = other6; 

Rzone vital, vitalFilled, end, endFilled, EyeFilledSpace; 
FindShapeVitalEnd( &shape, &vital, &end, e ); 

InitLocals( &EyeFilledSpace, &vitalFilled, &endFilled, vital, end, e) ; 
switch ( shape ){ 

case other6: //the shape has the life property 

if( EyeFilledSpace. size () == 5 ) 
e . setEyeStatus ( AlivelnAtari ); 
else 

e. setEyeStatus ( Alive ); 
break; 

case tllll24: 

if ( endFilled. size () == 0 ){ 
switch ( vitalFilled. size () ){ 
case 0: 

e . setEyeStatus ( Alive ); 
break; 
case 1: 

e. setEyeStatus ( Unsettled ); 

//set Hot Spot: the intersection in vital not present in 
// vitalFilled 

break; 
case 2: 

e . setEyeStatus ( Nakade ) ; 
break; 

> 



} 
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else{ 

if ( EyeFilledSpace.sizeO == 5 ) 
e. setEyeStatus ( AlivelnAtari ); 
else 

e. setEyeStatus ( Alive ); 

> 

break; 

case t 122223: 

if( endFilled. size() == 1 ){ 

if( EyeFilledSpace.sizeO == 5 ) 
e. setEyeStatus ( AlivelnAtari ); 
else 

e. setEyeStatus ( Alive ); 

} 

else{ 

switch ( vitalFilled. size() ){ 
case 0: 
case 1: 
case 2: 

e . setEyeStatus ( Alive ); 
break; 
case 3: 

e. setEyeStatus ( Unsettled ); 
//set Hot Spot 
break; 
case 4: 

e. setEyeStatus ( Nakade ); 
break; 

} 

break; 

> 

case t222233: [. . .] 
break; 

case tll2224: [. . .] 
break; 

case tll2233a: [. . .] 
break; 

case tll2233b: [. . .] 
break; 

> 



> 
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Appendix B: Equivalence classes for {5,6,7}-shapes 

Below we present the complete set of eye shapes of size five, six and seven 
grouped by equivalence classes. 




Figure 21. The complete set of hexominoes grouped by classes. 
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Figure 22. The complete set of size seven eye shapes grouped by classes. 
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Abstract Search algorithms based on the notion of proof and disproof numbers have been 
shown to be effective in many games. In this paper, we modify the depth-first 
proof-number search algorithm df-pn, in order to apply it to the game of Go. We 
develop a solver for one-eye problems , a special case of enclosed tsume-Go [ life 
and death] problems. Our results show that this approach is very promising. 

Keywords: Go, proof-number search, df-pn, one-eye problem 

1. Introduction 

Computer Go is one of the ultimate challenges for games researchers. Despite 
a lot of efforts, the best programs can still be easily beaten even by human players 
of moderate skill. 

One weakness of current Go programs is recognizing whether groups are 
alive or dead. Such tsume-Go (life and death) problems play a critical role in 
deciding the outcome of many games. Currently most Go-playing programs 
rely on a combination of exact and heuristic rules to evaluate tsume-Go (Chen 
and Chen, 1999; Kraszek, 1988). However, this approach does not always 
guarantee the correctness of the results. 

In general, search is the only way to assess correctly the life-and-death status 
of stones. However, the large branching factor of Go makes it hard to apply a 
purely search-based approach. For enclosed.tsume-Go problems with a small to 
moderate branching factor, the state of the art is already very good. GoTools, 
the currently best tsume-Go solver, achieves high dan amateur level (Wolf, 
2000). Search-based approaches have been very successful in other games such 
as chess, Othello, and shogi. In particular, in tsume-shogi (shogi checkmating 
problems), algorithms using proof and disproof numbers such as Seo’s PN* 
and Nagai’s df-pn (Seo, 1995; Nagai, 2002) have solved all difficult problems, 
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including those with solution sequences of hundreds of plies . Their performance 
far surpasses that of human players. 

In this paper, we adapt the df-pn algorithm to the game of Go, and apply it to 
a restricted version of tsume-Go: the problem of making one eye in an enclosed 
position. This special case can be solved with a simpler evaluation function, 
but retains all the search-related difficulties of tsume-Go. To our knowledge, 
this is the first attempt to apply df-pn to computer Go. Our results are very 
promising. Even with very modest game-specific enhancements, our df-pn- 
based solver can quickly solve enclosed positions up to about 18 empty points. 
This compares favourably to state of the art tsume-Go solvers, which can solve 
general tsume-Go problems of up to about 14 empty points in reasonable time. 

The structure of this paper is as follows. Section 2 describes the one-eye 
problem in Go and related work on tsume-Go. Section 3 reviews the df-pn 
algorithm. Section 4 explains a problem of df-pn in domains with position 
repetition, and develops a solution. Section 5 describes the basic one-eye solver 
and a few problem-specific enhancements. Section 6 deals with our current 
implementation of ko threats. Section 7 discusses the experimental results. 
Section 8 concludes and outlines further research directions. 

2. The One-Eye Problem in Tsume-Go 

The one-eye problem in Go is the question whether a player can create an 
eye connected to the player’s stones in a given region. Although the problem 
is simpler than full tsume-Go, it has many issues in common. For example, 
every tsume-Go problem in which the group under attack already has one eye 
in some region reduces to the one-eye problem on the rest of the board. 

A specialized one-eye solver also promises to be useful to enhance the knowl- 
edge in a heuristic Go program. Typical current programs use elaborate heuristic 
rules to assign statically a number of eyes to a region of the board (Chen and 
Chen, 1999; Fotland, 2002). Replacing some of these heuristics by exact results 
can improve group strength estimation and thereby overall position evaluation. 

A one-eye problem in a given Go position is defined by the following inputs. 

■ The two players, called the defender and the attacker. The defender tries 
to make an eye and the attacker tries to prevent it. 

■ The region, a subset of the board. At each turn, a player must either make 
a legal move within the region or pass. 

■ One or more blocks of crucial stones of the defender. The defender wins 
a one-eye problem by creating an eye connected to all the crucial stones 
inside the region. The attacker can win by either capturing at least one 
crucial stone, or by preventing the defender from creating an eye in the 
region. 

■ Safe attacker stones which surround the region together with crucial de- 
fender stones. 
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Figure 1 shows an exam- ABCDEFGHJ 

pie of a one-eye problem. 

Black is the defender and 
White is the attacker. Cru- 
cial stones are marked by 
triangles and the region is 
marked by crosses. Black 
must make an eye inside 
the region, while White tries 
to prevent that. There are 
unsafe stones at C6, E7, 
and H6. If these stones 
are captured, a player might 
play at such a point later, 
so they are part of the re- 
gion. 

Figure 1. Example of a one-eye problem (Black to live). 

2.1 Related Work on Tsume-Go 

Wolf’s (1994) GoTools is the currently best tsume-Go solver that spe- 
cializes in solving completely enclosed positions. GoTools contains a so- 
phisticated evaluation function that includes dynamic aspects, powerful rules 
for life-and-death recognition, and learning dynamic move ordering from the 
search (Wolf, 2000). Most competitive Go programs also contain a tsume-Go 
module. The commercial database Tsume-Go Goliath uses aproof-number 
search engine to check the user’s inputs. 

3. Df-pn: Depth-First Proof-Number Search 

In this section we give an overview of the standard df-pn algorithm. Nagai’s 
(2002) thesis is available for a detailed explanation. 

3.1 Proof and Disproof Numbers 

Proof and disproof numbers and Allis’ proof-number search (PNS) (Allis, 
Van der Meulen, and Van den Herik, 1994) are the basis of this algorithm. The 
proof number of a node in an AND/OR tree is defined as the minimum number 
of leaf nodes that must be proven to prove the node for the first player, while the 
disproof number is the minimal number of leaf nodes that must be disproven 
(proven a win for the second player) in order to disprove the node. Proof and 
disproof numbers can be viewed as an estimate of how easy it is to prove or 
disprove a tree. 
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Proof-number search (PNS) maintains a proof number and a disproof number 
for each node. The leaf node to expand next is chosen in a best-first manner. 
Starting from the root, PNS traverses the tree by continuously selecting a child 
whose (dis)proof number is minimum at OR (AND) nodes, until it reaches a 
leaf node called a most-proving node. PNS expands that node and recomputes 
the proof and disproof numbers on the path to the root. This process continues 
until the root is either proven or disproven. 

3.2 The Df-pn Algorithm 

Df-pn (Nagai, 2002) turns PNS into a depth-first search algorithm by gener- 
alizing ideas behind Seo’s (1995) PN* algorithm. As a depth-first search, df-pn 
can expand less interior nodes and use a smaller amount of memory than PNS. 
Like PNS, it always expands a most-proving node. 

Figure 2, adapted from Nagai (2002) , presents pseudocode of the df-pn 
algorithm. Df-pn utilizes two thresholds, one for proof numbers and one for 
disproof numbers. For the sake of simplicity, the code is written in the negamax 
form, because disproof numbers are a dual notion of proof numbers. For each 
node n, two variables </> and 5 are defined as follows: 

, _ f pn(n) (n is an OR node) 

n '® ~ \ dn(n) (n is an AND node) 

g _ f dn(n) (n is an OR node) 

— \ pn(n) (n is an AND node) 

While the iterative deepening method usually has a global threshold, df-pn’s 
thresholds work as local thresholds at each recursive call. This approach is 
similar to recursive best-first search (Korf, 1993). The main function Df-pn 
initializes both thresholds to infinity, and then calls the recursive function MID 
that iterates over nodes. When returning from MID, the root node is either 
proven or disproven. MID traverses the subtree below node n in a depth-first 
manner. It explores nodes while proof or disproof numbers do not exceed the 
threshold, or until it finds a terminal node that determines a winner. In the code, 
IsTerminal checks if n is a terminal node, while WinforCurrentNode checks 
whether a terminal node is a win or a loss. When a node n is expanded, the 
best child n c in terms of proof and disproof numbers is selected by SelectChild 
for a reclusive call to MID with the following new thresholds: n c .S is set to 
the minimum of the current threshold for n and the value when n’s child with 
the second smallest 6 becomes the most-proving node during the exploration 
of n c ’s subtree. Note that n.<j> corresponds to n c .S because of the negamax 
formulation. n c .(f) works like the cost function of the IDA* algorithm (Korf, 
1985). 

Because df-pn is an iterative-deepening method that expands interior nodes 
again and again, the heart of the algorithm is the transposition table, a large 
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if Set up for the root node 
int Df-pn(node r) { 
r.0 = oo; r.S - oo; 

MID(r); 
if (r.S = oo) 

return win_for_root; 
else 

return lass Jor_root; 

} 

// Iterative deepening at each node 
void MlD(node n) { 
TTlookup(?vM); 
if { n.<f> <0|| n.S < $) { 
//Exceed thresholds 
n.0 — 0; n.S — S\ 
return; 

} 

// Terminal node 
if (IsTerminal(n)) { 

if (WinforCurrentNode(n)) { 
n.0 — 0; = oo; 

return; 

} else { 

n.<fi = oo; n.S — 0; 

return; 

} 

} 

GenerateM o ves(n) ; 
if Store larger proof and disproof 
// numbers to detect repetitions 
TTstoreO^n^n.tf); 

// Iterative deepening 
while (n,0 > AMin(n) && 
n.S > #Sum(7*)) { 
n c = SelectChild(n t 0 ct ^c A); 
ff Update thresholds 
n c 0 - n.5 + 0c - $Sum(ra); 
n c .S = min (7i. 0^2 + 1); 
MIDK); 

} 

// Store search results 

n.0 = AMin(n); n.S = $Sum(7i); 

TTstore(n t n,0 f n.^); 

} 



U Select the most promising child 
node SelectChild(node n, int &0 C , 

int &S cy int ScSz) { 

node Tifiest > 

Sc ~~ 0 c ~ oo , 

for (each child n C hiid) { 

TT 1 ook up(n c h ud * 0 , 5 ); 
ff Store the smallest and second 
ff smallest S in S c and 62 
if (S<S C ){ 

= nchiit f, 

S 2 — S c \ 0c “ 0; S c = S\ 

} 

else if (S < 62 ) 

62 =S; 
if (0 — 00) 
return ribest ; 

} 

return n^t\ 

} 

ff Compute the smallest 6 of 
ff n's children 
int AMin(node n) { 
int min = 00; 
fur (each child n C hUd) { 
TTIookup(n c ^id, 0 ^); 
min * min 

} 

return min ; 

} 

// Compute sum of 0 of n’$ children 
int <$>Sum(node n) { 
int sum - 0; 
for (each child n c hud ) { 

TT 1 00 k up (n c h ud * 0 ,6 ) ; 
sum - sttm + 0; 

} 

return sum, 

} 



Figure 2. Pseudocode of the df-pn algorithm. 
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cache storing previous search efforts, i.e., proof and disproof numbers for vis- 
ited nodes. TTstore stores proof and disproof numbers of a node in the table. 
TTlookup checks the table for information on proof and disproof numbers of a 
node. If no result is found, both numbers are initialized to 1. 

4. Computing Proof and Disproof Numbers in Domains 
with Repetitions 

When we tried to apply df-pn to the one-eye problem in Go, df-pn could not 
solve some easy problems. The standard df-pn algorithm has a fundamental 
problem when applied to a domain with repetitions. Figure 3 shows an example. 
Assume F is unknown, then the df-pn algorithm computes pn(E’) = pn(A) + 
pn(F). Hence, pn(E) is larger than pn(A). Df-pn’s termination condition is 
(see Figure 2): 



n.cj) < AMin(n) || n.S < <&Sum(n) 

Usually the threshold of the proof 
number is only a little bit larger than 
pn(A) when exploring A’ s subtree 
in df-pn. Therefore, assuming that 
df-pn reaches E, df-pn exceeds the 
proof number threshold, stops ex- 
panding and updates A’ s proof num- 
ber to pn(E) = pn (A) + pn(jF). 

Even if E is chosen in a later iteration, 
this phenomenon continues and F is 
never explored. These repetitions of- 
ten happen in Go, because passes are 
allowed. Two consecutive passes lead 
back the same position in a short loop. 

Adding proof numbers from an ancestor to a node seems intuitively bad, 
since it leads to double-counting of the leaf nodes below. In our solution to this 
problem, we classify the children of a node into two types. A field minimal 
distance (md) of a node n is initially set to the length of the shortest path from 
the root to n, the depth of n in the search tree. We call a child n % normal if 
rii.md > n.md , and old if rii.md < n.md. Among the children n\ • • • , of 
n, let ni • • • , rii (1 < l < k) be the normal and n/+i, • • • , the old children. 
We modify the computation of proof and disproof numbers in the following 
way: 




| | OR node AND node 

Figure 3. A problem with repetitions in df-pn. 



n.cb = min rii.5 
l<i<k 
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l 

(if 

i — 1 
l 

(if = 0) 

i = 1 



Figure 4 illustrates an example of 
computing proof numbers. IfF is nei- 
ther proven nor disproven, then F’s 
proof number cannot be 0. Therefore 
we ignore A to compute E’s proof 
number, since A is an old child. 

When a node has only old children, 
since all normal (and possibly some 
old) children have been solved, that 
node itself must be considered old, 
since now there is no way to prove 

or disprove it without exploring old j j OR node (^) AND node md Minimal distance 

nodes. Therefore, the md of that node 

must be updated. We set it to the min- Fi s ure 4 - Df 'P n with minimal distance md - 
imum of the md fields of the currently unsolved old children. 

Figures 5 and 6 depict an example of updating md. In this figure, assuming 
that G is proven, E now has only an old child to explore, because F is also 
proven. In that case E’s minimal distance is updated to A’s distance, and pn (E) 
becomes p n(A). Further, C.md is set to E.md (see Figure 6). As a result, 
pn(C) is now ignored in the computation of pn(i?), since C has become an 
old child. 

Dealing with overcounting proof numbers caused by repetitions was essential 
to make df-pn work in Go. We note that Nagai (2002) achieves impressive 






Figure 5. Updating E’s minimal distance. Figure 6. Computing C’s minimal distance. 
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results with his tsume-shogi solver, and described the GHI problem, which 
returns incorrect results involving cycles . However, this problem was not 
described in his papers. One possibility is that although the same problem 
could happen in shogi, it might happen much less often than in Go. Search 
in Go can easily return to identical states, for example by consecutive pass 
moves. Another possibility is that this problem tends to happen less frequently 
with additional search enhancements. Because Nagai’s tsume-shogi solver is 
enhanced with a great deal of domain-dependent knowledge, it might not occur 
in his case in practice. However, in a personal communication the existence 
of this problem in shogi was confirmed by Tsuruoka and Maruyama of team 
Gekisashi. As well, Sakuta found that df-pn did not work better than PDS 
(Nagai, 1999) in his tsume-shogi solver, and gave as possible explanation the 
occurrence of DCGs (Sakuta, 2001). 

5. Application of Df-pn to the One-eye Problem 

Below we apply the df-pn algorithm to the one-eye problem. We start with the 
basic one-eye algorithm (5.1). Then we provide several game-specific search 
enhancements (5.2). The section is concluded by a simulation (5.3). 

5.1 The Basic One-eye Algorithm 

The basic algorithm, due to Anders Kierulf, is quite simple, and has been 
used as part of the tsume-Go search in the program Explorer for many years. 
It detects single-point eyes and false eyes. 

The algorithm checks for all points in the region whether they are a potential 
eye point for the defender. Eyes are created by either surrounding empty points 
or by capturing attacker stones. If a safe eye connected to the crucial stories can 
be created in the region, the defender wins. If there is no potential eye space in 
the region, the attacker wins. 

Whether a point E is a potential eye point is computed as follows: 

■ E occupied by unsafe attacker stone: yes. 

■ E occupied by safe attacker stone: no. 

■ E occupied by defender stone: no. 

■ E is empty: check the neighbours and the diagonal neighbours of E. 

- Some direct neighbour is occupied by the attacker: no. 

- E is at the edge of the board and at least one diagonal neighbour 

contains a safe attacker: no. 

- At least two diagonal neighbours contain a safe attacker: no. 

- Otherwise: yes. 

A potential eye point is a safe eye if all direct neighbours and all but one di- 
agonal neighbour are occupied by defender stones. All diagonal neighbours are 
needed at the edge of the board. A safe eye is a defender win if the surrounding 
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block is connected to crucial stones, and all crucial stones are connected. The 
search generates all moves in the region, unless there are forced moves (see 
below). 

5.2 Game-specific Search Enhancements 

Safety by Connections to Safe Stones. Connectivity is a fundamental as- 
pect of the game of Go. Most Go programs recognize connected blocks. We 
use connections to promote unsafe attacker stones to safe, and to prove that a 
defender eye is connected to crucial stones. Both types of connections help to 
reduce the search depth. 

Our current implementation recognizes simple miai strategies (Muller, 1997) 
and some protected liberties for connections. Figure 7 gives examples of the 
strategy. In the left diagram, White has two ways (A and B) to connect. Even 
if Black plays first, the white block marked with squares can connect to safe 
stones. The stone at F6 is also safe now, because it has a connection either at C 
or at D. Since there is no eye space, this position can be statically evaluated as 
a loss for Black. Similarly, in the right diagram in Figure 7, the connection at E 
or F guarantees a win for Black. The algorithm to compute these connections 
is straightforward. It checks if safe blocks S have two liberties to connect to a 
block b. If this is the case, b is included in S and the two liberties are marked 
to not be used for other connections. The process continues until no further 
blocks can be added to S. 
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Figure 7. Connections to safe stones. 
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We find more safe stones by recog- ABCDEFGHJ 
nizing some forms of protected liber- 
ties. Figure 8 shows an example. The 
stone marked with a square has only 
one connection point at B to a safe 
white block. However, this connec- 
tion is safe since the stone has another 
liberty and the opponent cannot play 
at B. 

Forced Moves. Forced moves are a 

safe form of pruning when one player 
threatens to win immediately. We 
defined two kinds of forced moves, 
forced attacker moves , and forced de- 
fender moves, which correspond to ipl 
or gil threats in Abstract Proof Search Figure & Connection to safe stones on pro- 
tected liberty. 





A B C D 
Forced Moves. 



The first type of forced move is on a point where the defender could complete 
an eye that is connected to the crucial stones. The left position in Figure 9 
presents an example. Black can make an eye at A. White must play at A to stop 
an immediate win for Black. 

The second type of forced move is defined as follows: 



1 There is no empty eye space for the defender in the region. 



Df-pn in Go: An Application to the One-Eye Problem 



135 



2 There is exactly one unsafe attacker’s block b. 

3 b has a single-move connection to safe stones. If the defender plays any 
other move, the attacker can connect b to safety, leaving the defender 
with no potential eye points. 

For instance, in the right position of Figure 9 the move at B is forced. 

Forced moves give a large reduction of the search space by decreasing the 
branching factor. 

5.3 Simulation 

Simulation was invented by Kawano ( 1 996) to solve effectively positions with 
useless interposing piece drops in tsume-shogi problems . Later, Tanase (2000) 
extensively applied this idea to his a/3-search engine to reduce the overhead of 
calling the tsume-shogi solver inside the normal search . Assume that P is a 
proven position and Q is a “similar” one we want to prove. Simulation borrows 
moves from P’s proof tree to try to find a quick proof of Q. A dual notion 
called dual simulation can be used to disprove a position. 

In our solver, we apply simulation and dual simulation as follows: 

■ At an AND node n, if one of n’s children, n c , is proven at some point in 
the search, apply simulation to all unsolved children of n. 

■ Similarly, at an OR node n, apply dual simulation if one of n’s children 
is disproven. 

This use of simulation is much more extensive than in tsume-shogi. See the 
experimental section for a discussion. 

6. Ko and Ko Threats 

Sometimes the outcome of a one-eye problem depends on ko. It is therefore 
important to model ko threats and ko recaptures in the search algorithm. 

The approach taken in GoTools can require several searches (Wolf, 1994). 
The parameter to each search is how many ko recaptures are allowed for a 
specified kowinner. 

Our current implementation allows only two options: one is to disallow any 
immediate ko recaptures; the other is to always allow ko recaptures for the 
designated kowinner. We search in one or two phases. The first search of a 
position, phase 1, disallows immediate ko recaptures, but marks nodes where 
such moves exist. If the search result depends on marked nodes, in phase 2 
a re-search is performed. The loser of the phase 1 search is the designated 
kowinner for phase 2. 

Phase 2 reuses the contents of the transposition table from phase 1. The 
following implementation of the transposition table aims to reduce the amount 
of re-search: 
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1 The Zobrist (1970) hash function is modified to account for a stone cap- 
tured in the previous ko capture, to differentiate identical positions with 
different histories. 

2 Two flags, one for each colour, in each transposition table entry keep 
track of any possible ko captures in the subtree below that node. If there 
is a ko capture for a player, the flag for the other player is set to indicate 
that we will allow a ko recapture after that node in a re-search. When a 
node n is proven (similarly for disproven), flags are set as follows: 

■ If n is an OR node and n c is n’s proven child, n’s flags are set the 
same as n c ’s flags. 

* If n is an AND node and the flag of one of the children is set, n’s 
flag is set. Otherwise, n’s flag is cleared. 

In the phase 2 re-search, many phase 1 (dis)proofs can be reused. For ex- 
ample, assume that a node is proven and the flag for the kowinner is not set. 
Then we can use the proof from the transposition table. Similarly, we can also 
reuse disproofs. Even for nodes that are not proven or disproven, the proof 
and disproof numbers from phase 1 are valuable information for directing the 
re-search. 

Re-searches usually have a low overhead, since we keep the previous results 
in the transposition table and reuse the table entries in most cases. However, if 
the solution changes dramatically by ko compared to the solution from the first 
search, a higher overhead results. 

7. Empirical Results 

This section consists of: test data (7.1), setup of experiments (7.2), test runs 
(7.3), and further comments on the experiments (7.4). 

7.1 Test Data 

In contrast to full tsume-Go, for which many large collections of test problems 
are available, we could not find any specialized collection of one-eye problems. 
Our current set of 70 test positions was created mainly by the authors. The 
problems can be played for both colours going first, resulting in a total of 140 
problems. All problems are of the following form: a black group already has 
one safe eye, and is completely surrounded at a distance by safe white stones. 
The area in between forms the region, and the fate of the black group depends 
on whether it can form a second eye in the region. Problems of this kind are 
also suitable for solution by a general tsume-Go solver, since making one eye 
is equivalent to solving the tsume-Go problem. 

The test set is available at http://www.cs.ualberta.ca/~games/go/ 
oneeye. The problems include a mix of easy and hard problems. Some prob- 
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Figure 10. Example of a hard problem (Black to live). 

lems are challenging only for one colour playing first, and are very easy if the 
other colour plays first. Some of the positions are hard to solve for current 
tsume-Go programs. For an example, see Figure 10. 

7.2 Setup of Experiments 

All experiments were performed on a Pentium IIX/700 Mhz with a 100 MB 
transposition table. The time limit was 5 minutes per problem. 

The following abbreviations are used for the methods and enhancements 
described above. 

■ Df-pn: The basic df-pn algorithm. 

■ MIN: Minimal distance modification for computing proof and disproof 
numbers. 

■ AC: Connections to safe stones for attacker 

■ DC: Connections to crucial stones for defender 

■ FAM: Forced attacker’s moves 

■ FDM: Forced defender’s moves 

■ SIM: Simulation and dual simulation 

7.3 Test Runs 

Adding Enhancements. Table 1 shows the results on the test set, starting 
with basic df-pn and switching on enhancements one by one. The total execution 
time and number of nodes expanded were computed using the subset of 126 
problems that are all solved by methods (2) - (7) in the table. 
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Enhancements 

used 


Number of 
problems 
solved 


Total 
time (sec) 
126 Problems 


Total 

nodes expanded 
126 Problems 


Nodes 
expanded 
per second 


a) 


Df-pn 


20 


- 


- 


- 


(2) 


(1) + MIN 


126 


806 


11,933,976 


14,806 


(3) 


(2) + AC 


132 


424 


5,431,557 


12,810 


(4) 


(3) + DC 


132 


444 


5,377,408 


12,116 


(5) 


(4) + FDM 


132 


436 


5,142,100 


11,802 


(6) 


(5) + FAM 


133 


113 


1,354,506 


11,970 


(7) 


(6) + SIM 


134 


81 


1,168,683 


14,347 



Table 1. Performance for successively switching on enhancements. 

The table shows the importance of the MIN modification. The only prob- 
lems solved by basic df-pn were very easy ones that needed at most 400 nodes. 

Search speed decreases a little with more enhancements, but improves again 
with simulation. Simulation provides a fast way to generate moves, faster than 
our current normal move generator, which has some overhead such as checking 
connections. 

Leave-One-Out Experiments. The results for switching off a single en- 
hancement at a time are shown in Table 2. 



Enhancement 
Turned Off 


Number of 
Problems 
Solved 


Total 
Time (s) 
(129 Problems) 


Total 

Nodes Expanded 
(129 Problems) 


Nodes 
Expanded 
per Second 


MIN 


74 


- 


- 


- 


AC 


129 


393 


7,096,603 


18,058 


DC 


134 


138 


2,081,344 


15,070 


FDM 


134 


264 


3,705,778 


14,052 


FAM 


133 


402 


5,590,511 


13,907 


SIM 


133 


175 


2,123,969 


1 12,137 



Table 2. Performance for turning off single enhancements. 



Performance of Simulation. Table 3 shows the performance data for sim- 
ulation in phase 1 searches. Since the method is applied in a very basic way, 
45.2 % success seems to be a good initial result, with plenty of room for further 
refinements. 



Total Nodes Nodes by SIM j 


SIM calls successful calls 


6,265,984 


1,116,386 (17.8 %) 


262,628 


118,706(45.2%) 



Table 3. Performance data on simulation for all 134 solved problems. All enhancements on. 
Phase 1 searches only. 
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Re-searches for Ko. Table 4 shows a summary of the overhead incurred by 
re-searches for ko. In phase 1, immediate ko recaptures are not allowed. Phase 
2 are the researches with a designated kowinner. The results in this table are 
also with all enhancements. 



Total Nodes (134 Problems) ] 


Phase 1 


Phase 2 


6,265,984 (95.4 %) 


304,107 (4.6 %) 



Table 4. Overheads for ko re-searches 



The overhead is quite small, but of course this is mainly a property of the 
test set used, which contains only a few cases with complex ko fights. In the 
worst case encountered, problem oneeyeb.10.sgf with Black to play, phase 1 
took 7,340 nodes and phase 2 took 1 1,728 nodes. 

7.4 Further Comments on the Experiments 

Reexpansion of Interior Nodes. One concern in df-pn is the overhead of 
reexpansion of interior nodes. In our experiments, the ratio of interior nodes 
expanded to total nodes is about 30 %. In Seo’s experiments in shogi, this 
ratio was about 20 %. Since information achieved dynamically is usually more 
reliable than static evaluations, we think that our 30 % is still a very small price 
to pay to achieve more cut-offs. 

Currently Unsolved Problems. 

Our solver currently cannot solve 
6 problems in our test suite. Fig- 
ure 1 1 shows an example. All un- 
solved problems feature large re- 
gions with many possible moves. 

Besides, some problems such as in 
Figure 10 and 11 stretch the lim- 
its of the one-eye problem, such as 
semeai, and tsume-Go. Figure 11, 
for example, can be seen as a prob- 
lem whether white stones adjacent 
to black crucial stones can make 
two-eyes or not, having no split be- 
tween the one-eye and tsume-Go 
problems. As well, the practical Figur f [ L Black to play and live: A currently 
,. . . , unsolved problem, 

limit of our current solver seems to 

be at around 18 empty points, which compares favourably with about 14 empty 
reported for GoTools. However, we need further investigations to assess this 
limit and improve the ability of our solver for difficult problems. 
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8. Conclusions and Future Work 

The early results of our work on applying df-pn to Go and specifically to 
the one-eye problem are very encouraging. There are numerous possible en- 
hancements, both for improving the search algorithm and for adding Go-specific 
knowledge. Examples are recognizing larger eyes, refining the knowledge about 
connections, generalizing forced moves similar to Cazenave’s APS, heuristic 
initialization of proof and disproof numbers, and search in open-ended areas. 

To apply these ideas to other problems in Go is also an interesting research 
topic. Examples include full tsume-Go (two-eye problems), tactical capture 
search and connection search. 

8.1 Comparison with related Programs 

We would like to compare our program with general tsume-Go solvers to 
assess its performance. However, it is hard to make a fair comparison since 
our algorithm solves only a restricted problem. Evaluation for two eyes is 
much harder than for one eye, and many years of hard work have gone into the 
development of the Go knowledge in programs such as GoTools. However, 
we believe that as a search algorithm our modified df-pn works very well for Go. 
In informal experiments it seems that our algorithm can already solve harder 
problems in our test set than other programs. One possible advantage of the 
df-pn algorithm is that it uses the transposition table more extensively. Only 
solved positions are saved in the transposition table in GoTools (Wolf, 2000), 
while in df-pn proof and disproof numbers of previous iterations are stored in 
the transposition table to improve the order of tree expansion (Nagai, 2002). 

8.2 The GHI Problem in Df-pn 

So far in this paper, we have not addressed the graph history interaction 
(GHI) problem (Palay, 1985). This problem occurred in our experiments, for 
example in double or triple ko situations. If GHI is ignored, incorrect results 
are stored in the transposition table. We developed a new approach that differs 
from the one described in Breuker et al. (2001) for the case of proof-number 
search. The method will be described in a forthcoming publication (Kishimoto 
and Muller, 2003). 
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Abstract This paper presents a learning system for scoring final positions in the Game 
of Go. Our system learns to predict life and death from labelled game records. 
98.9% of the positions are scored correctly and nearly all incorrectly scored posi- 
tions are recognized. By providing reliable score information our system opens 
the large source of Go knowledge implicitly available in human game records, 
thus paving the way for a successful application of machine learning in Go. 

Keywords: Go, learning, neural net, scoring, game records, life and death 

1. Introduction 

Evaluating Go positions is one of the hardest tasks in Artificial Intelligence 
(AI). In the last decades, stimulated by Ing’s million-dollar price for the first 
computer program to defeat a professional Go player (which has expired un- 
challenged), Go has received significant attention from AI research (Bouzy and 
Cazenave, 2001; Muller, 2002). Yet, despite all efforts, the best computer Go 
programs are still no match even for human amateurs of only moderate skill. 
Partially this is due to the complexity of Go, which makes brute-force search 
techniques infeasible on the 19 x 19 board. However, on the 9x9 board, which 
has a complexity between Chess and Othello (Bouzy and Cazenave, 2001), the 
current Go programs perform nearly as bad. The main reason lies in the lack of 
good positional evaluation functions. Many (if not all) of the current top pro- 
grams rely on (huge) static knowledge bases derived from the programmers’ 
Go skills and Go knowledge. As a consequence the top programs are extremely 
complex and difficult to improve. In principle a learning system should be able 
to overcome this problem. 

In the past decade several researchers have used machine-learning techniques 
in Go. After Tesauro’s (1995) success story many researchers, including Dahl 
(2001), Enzenberger (1996) and Schraudolph et al. (1994), have applied Tern- 
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poral Difference (TD) learning for learning evaluation functions. Although 
TD-leaming is a promising technique, which was underlined by NeuroGo’s 
latest performance at the 21st Century Championship Cup (Myers, 2002), there 
has not been a major breakthrough, such as in Backgammon, and we believe that 
this will remain unlikely to happen in the near future as long as most learning 
is done from self-play or against weak opponents. 

Over centuries humans have acquired extensive knowledge of Go. Since that 
knowledge is implicitly available in the games of human experts, it should be 
possible to apply machine-learning techniques to extract that knowledge from 
game records. So far game records have only been used successfully for move 
prediction (Enderton, 1991; Dahl, 2001; van der Werf et al., 2002). However, 
we are convinced that much more can be learned from these game records. 

One of the best sources of game records on the Internet is the No Name Go 
Server game archive (NNGS, 2002). NNGS is a free on-line Go club where 
people from all over the world can meet and play Go. All games played on 
NNGS since 1995 are available on-line. Although NNGS game records contain 
a wealth of information, the automated extraction of knowledge from these 
games is a non-trivial task at least for the following three reasons. 

Missing Information. Life-and-death status of blocks is not available. In 
scored games only a single numeric value representing the difference 
in points is available. 

Unfinished Games. Not all games are scored. Human games often end by one 
side resigning or abandoning the game without finishing it, which often 
leaves the status of large parts of the board unclear. 

Bad Moves. During the game mistakes are made which are hard to detect. 
Since mistakes break the chain of optimal moves it can be misleading 
(and incorrect from a game-theoretical point of view) to relate positions 
before the mistake to the final outcome of the game. 

The first step toward making the knowledge in the game records accessible is 
to obtain reliable scores at the end of the game. Reliable scores are obtained by 
correct classification of life-and-death stones on the board. This paper focuses 
on determining life and death for final positions. By focusing on final positions 
we avoid the problem of unfinished games and bad moves during the game, 
which will have to be dealt with later. 

It has been pointed out by Muller (1997) that proving the score of final 
positions is a hard task. For a set of typical human final positions, Muller showed 
that a combination of complex static analysis and search, still leaves around 
75% of the board-points unproven. Heuristic classification of his program 
Explorer classified most blocks correctly, but still left some regions unsettled 
(and to be played out further). Although this may be appropriate for computer- 
computer games it can be annoying in human-computer games, especially under 
the Japanese rules which penalize playing more stones than necessary. 
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Since proving the score of most final positions is not (yet) an option, we focus 
on learning a heuristic classification. We believe that a learning algorithm for 
scoring final positions is important because: 1) it provides a more flexible 
framework than the traditional hand-coded static knowledge bases, and 2) it 
is a necessary first step toward learning to evaluate non-final positions. In 
general such an algorithm is good to have because: 1) large numbers of game 
records are hard to score manually, 2) publicly available programs still make 
too many mistakes scoring final positions, and 3) it can avoid unnecessarily 
long human-computer games. 

The rest of this paper is organised as follows. Section 2 discusses the scoring 
method. Section 3 presents the learning task. Section 4 introduces the repre- 
sentation. Section 5 provides details about the dataset. Section 6 reports our 
experiments. Finally, section 7 presents our conclusions. 

2. The Scoring Method 

The two main scoring methods in Go are territory scoring and area scoring. 
Territory scoring, used by the Japanese rules, counts the surrounded territory 
plus the number of captured opponent stones. Area scoring, used by the Chinese 
rules, counts the surrounded territory plus the alive stones on the board. The 
result of the two methods is usually the same up to one point. The result may 
differ since one player placed more stones than the other, for three possible 
reasons; (1) because Black made the first and the last move, (2) because one 
side passed more often during the game, and (3) because of handicap stones. 
(Under Japanese rules the score may also differ because territory surrounded 
by alive stones in seki is not counted.). In this paper area scoring is used since 
it is the simplest scoring method to implement for computers. 

Area scoring works as follows: First, the life-and-death status of blocks 
of connected stones is determined. Second, dead stones are removed from 
the board. Third, each empty point is marked Black, White, or neutral (the 
non-empty points are already marked by their colour). The empty points can 
be marked by flood filling or by distance. Flood filling recursively marks 
empty points to their adjacent colour. In the case that a flood fill for Black 
overlaps with a flood fill for White the overlapping region becomes neutral. 
(As a consequence all non-neutral empty regions must be completely enclosed 
by one colour.) Scoring by distance marks each point based on the distance 
toward the nearest remaining black or white stone(s). If the point is closer to a 
black stone it is marked black, if the point is closer to a white stone it is marked 
white, otherwise (if the distance is equal) the point does not affect the score 
and is marked neutral. Finally, the difference between black and white points, 
together with a possible komi, determines the outcome of the game. 
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In final positions scoring by flood filling and scoring by distance should 
give the same result. If the result is not the same, there are large open regions 
with unsettled interior points, which usually means that some stones should 
have been removed or some points could still be gained by playing further. 
Comparing flood filling with scoring by distance is therefore a useful check to 
detect whether the game is finished and scored correctly. 

3. The Learning Task 

The task of learning to score comes down to learning to determine which 
blocks of connected stones are dead and should be removed from the board. 
This is learned from a set of labelled final positions, for which the labels contain 
the colour controlling each point. A straightforward implementation would be 
to learn to classify all blocks based on the labelled points. However, for some 
blocks this not a good idea because their status can be irrelevant and forcing 
them to be classified just complicates the learning task. 

The only blocks required for a correct score are either alive and at the border of 
their area, or dead in the opponent’s area. This is illustrated by Figure 1 . Here all 
marked stones must be classified. The stones marked by triangles must be classi- 
fied alive. The stones marked by squares must be classified dead. The unmarked 
stones are irrelevant for scoring because they are not at the border of their area 
and their possible capturability does not affect the score. For example, the two 
black stones in the top-left comer kill the white 
block and are in Black’s area. However, they can 
always be captured by White, so forcing them 
to be classified as alive or dead is misleading 
and even unnecessary. (The stones in the bottom 
left comer are alive in seki because neither side 
can capture. The two white stones in the upper 
right comer are adjacent to two neutral points and 
therefore also at the border of White’s region.) 

Figure 1. Blocks to classify. 

3.1 Recursion 

Usually blocks of stones are not alive on their own. Instead they form chains 
or groups which are only alive in combination with other blocks. Their status 
also may depend on the status of neighbouring blocks of the opponent, i.e., 
blocks can live by capturing the opponent. (Although one might be tempted to 
conclude that life and death should be dealt with at the level of groups this does 
not really help because the human notion of a group is not well defined, difficult 
to program, and may even require an underlying notion of life and death.) 
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Because life and death of blocks is strongly related to the life and death of 
other blocks the status of other (usually nearby) blocks has to be taken into 
account. Partially this can be done by including features for nearby blocks 
in the representation. In addition, it seems natural to consider a recursive 
framework for classification which employs the predictions for other blocks 
to improve performance iteratively. In our implementation this is done by 
training a cascade of classifiers which use previous predictions for other blocks 
as additional input features. 

4. Representation 

In this section we will present the representation of blocks for classification. 
Several representations are possible and used in the field. The most primitive 
representations typically employ the raw board directly. A straightforward im- 
plementation is to concatenate three bitboards into a feature vector, for which 
the first bitboard contains the block to be classified, the second bitboard con- 
tains other friendly blocks and the third bitboard contains the enemy blocks. 
Although this representation is complete, in the sense that all relevant informa- 
tion is preserved it is unlikely to be efficient because of the high dimensionality 
and lack of topological structure. 

4.1 Features for Block Classification 

A more efficient representation employs a set of features based on simple 
measurable geometric properties, some elementary Go knowledge and some 
hand-crafted specialised features. Several of these features are typically used 
in Go programs to evaluate positions (Chen and Chen, 1999; Fotland, 2002). 
The features are calculated for single friendly and opponent blocks, multiple 
blocks in chains, and colour-enclosed regions (CERs). 

For each block our representation consists of the following features: (All 
features are single scalar values unless stated otherwise.) 

- Size measured in occupied points. 

- Perimeter measured in number of adjacent points, including points over 
the edge. 

- Opponents are the occupied adjacent points. 

- (First order) liberties are the free adjacent points. 

- Protected liberties are the liberties which cannot be played by the oppo- 
nent, because of suicide or being directly capturable. 

- Auto-atari liberties are liberties which by playing them reduce the liber- 
ties of the block from 2 to 1, which means that the blocks would become 
directly capturable (such liberties are protected for an adjacent opponent 
block). 
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- Second-order liberties are the liberties of (first-order) liberties (excluding 
the first-order liberties). 

- Third-order liberties are the liberties of second-order liberties (excluding 
first- and second-order liberties). 

- Adjacent opponent blocks 

- Local majority is the number of opponent stones minus the number of 
friendly stones within a Manhattan distance of 2 from the block. 

- Centre of mass represented by the distance to the closest and second- 
closest edge. 

- Bounding box size is the number of points in the smallest rectangular box 
that can contain the block. 

Adjacent to each block are colour-enclosed regions. CERs consist of con- 
nected empty and occupied points, surrounded by stones of one colour or the 
edge. It is important to know whether an adjacent CER is fully accessible, 
because a fully accessible CER surrounded by safe blocks provides at least one 
sure liberty. To detect fully accessible regions we use so-called miai strategies 
as applied by Muller (1997). In contrast to Muller’s original implementation we 
also add miai accessible interior empty points to the set of accessible liberties, 
and also use protected liberties for the chaining. For fully accessible CERs we 
include: 

- Number of regions 

- Size 

- Perimeter 

- Split points are crucial points for preserving connectedness in the local 
3x3 window around the point. (The region could still be connected by 
a big loop outside the local 3x3 window.) 

For partially accessible CERs we include: 

- Number of partially accessible regions 

- Accessible size 

- Accessible perimeter 

- Size of the unaccessible interior. 

- Perimeter of the unaccessible interior. 

- Split points of the unaccessible interior. 

The size, perimeter and number of split points are summed for all regions. 
We do not address individual regions because the representation must have a 
fixed number of features, whereas the number of regions is not fixed. 

Another way to analyse CERs is to look for possible eyespace. Points form- 
ing the eyespace should be empty or contain capturable opponent stones. Empty 
points directly adjacent to opponent stones are not part of the eyespace. Points 
on the edge with one or more diagonally adjacent alive opponent stones and 
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points with two or more diagonally adjacent alive opponent stones are false 
eyes. False eyes are not part of the eyespace (we ignore the unlikely case where 
a big loop upgrades false eyes to true eyes). Initially we assume all diagonally 
adjacent opponent stones to be alive. However, in the recursive framework (see 
below) the eyespace is updated based on the status of the diagonally adjacent 
opponent stones after each iteration. For directly adjacent eyespace of the block 
we include: 

- Size 

- Perimeter 

Since we are dealing with final positions it is often possible to use the opti- 
mistic assumption that all blocks with shared liberties can form a chain (during 
the game this assumption is dangerous because the chain may be split). For 
this, so-called, optimistic chain we include: 

- Number of blocks 

- Size 

- Perimeter 

- Split points 

- Adjacent CERs 

- Adjacent CERs with eyespace 

- Adjacent CERs , fully accessible from at least one block. 

- Size of adjacent eyespace 

- Perimeter of adjacent eyespace 

- External opponent liberties are liberties of adjacent opponent blocks 
which are not accessible from the optimistic chain. 

Adjacent to the block in question there may be opponent blocks. For the 
weakest (measured by the number of liberties) directly adjacent opponent block 
we include: 

- Perimeter 

- Liberties 

- Shared liberties 

- Split points 

- Perimeter of adjacent eyespace 

The same features are also included for the second- weakest directly adjacent 
opponent block and the weakest opponent block directly adjacent to or sharing 
liberties with the optimistic chain of the block in question. 

By comparing a flood fill starting from Black with a flood fill starting from 
White we find unsettled empty regions which are disputed territory (assuming 
all blocks are alive). If the block is adjacent to disputed territory we include: 

- Direct liberties in disputed territory. 

- Liberties of all friendly blocks in disputed territory. 

- Liberties of all enemy blocks in disputed territory. 
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4.2 Additional Features for Recursive Classification 

For the recursive classification the following six additional features are used: 

- Predicted value of the strongest friendly block with a shared liberty. 

- Predicted value of the weakest adjacent opponent block. 

- Predicted value of the second-weakest adjacent opponent block. 

- Average predicted value of the weakest opponent block’s optimistic chain. 

- Adjacent eyespace size of the weakest opponent block’s optimistic chain. 

- Adjacent eyespace perimeter of the weakest opponent block’s optimistic 
chain. 

Next to these additional features the predictions are also used to update the 
eyespace, i.e., dead blocks can become eyespace for the side that captures, alive 
blocks cannot provide eyespace, and diagonally adjacent dead opponent stones 
are not counted for detecting false eyes. 

5. The Data Set 

In the experiments we used game records obtained from the NNGS archive 
(NNGS, 2002). All games were played on the 9x9 board between 1995 and 
2002. We only considered games which are played to the end and scored, thus 
ignoring unfinished or resigned games. Since the game records only contain a 
single numeric value for the score, we had to find a way to label all blocks. 

5.1 Scoring the Data Set 

For scoring the dataset we initially used a combination of GnuGo and 
manual labelling. Although GnuGo has the option to finish games and label 
blocks the program could not be used without human supervision. The reasons 
for this are bugs, the inherent complexity of the task, and the mistakes made by 
weak human players which ended the game in positions that are not final, or 
scored them incorrectly. Fortunately, nearly all mistakes are easily detected by 
comparing GnuGo’s scores and labelled boards with the numeric scores stored 
in the game records. As an extra check all boards containing open regions with 
unsettled interior points (where flood filling does not give the same result as 
distance-based scoring) were also inspected manually. 

Since the scores did not match in many positions the labelling proved to 
be very time consuming. We therefore only used GnuGo to label the games 
played in 2002 and 1995. With the 2002 games a classifier was trained. When 
we tested the performance on the 1995 games it outperformed GnuGo’s la- 
belling. So therefore our classifier replaced GnuGo for labelling all other 
games (1996-2001), retraining it each time a new year was labelled. Although 
this speeded up the process it still required a fair amount of human intervention 
mainly because of games that contained incorrect scores in their game record. 
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A few hundred games had to be thrown out completely because they were 
not finished, contained illegal moves, contained no moves at all (for at least one 
side), or both sides were played by the same player. In a small number of cases, 
where the last moves would have been trivial but not actually played, we made 
the last few moves manually. 

Eventually we ended up with a dataset containing 18,222 final positions. 
Around 10% of these games were scored incorrectly (by the players) and were 
inspected manually. (Actually the number of games we inspected is signif- 
icantly higher because of the games that were thrown out and because our 
initial classifiers and GnuGo made mistakes). On average the final positions 
contained 5.8 alive blocks, 1.9 dead blocks, and 2.7 irrelevant blocks. (In the 
case that one player gets the full board all his blocks were assumed irrelevant 
although at least one block should of course be classified as alive.) 

Since the Go scores on the 9x9 board range from -81 to +81 the chances 
of an incorrect labelling leading to a correct score are low, nevertheless it could 
not be ruled out completely. On inspecting an additional 1% of the positions 
randomly we found none that were labelled incorrectly. Finally, when all games 
were labelled, we re-inspected all positions for which our best classifier seemed 
to predict an incorrect score. This final pass detected 42 positions (0.2%) which 
were labelled incorrectly, mostly because our initial classifiers had made the 
same mistakes as the players that scored the games. 

5.2 Statistics 

Since many game records contained incorrect scores we looked for reasons 
and gathered statistics. The first thing that came to mind is that weak players 
might not know how to score. Therefore in Figure 2 the percentage of incorrectly 
scored games related to the strength of the players is shown. (Although in each 
game only one side may have been responsible for the incorrect score, we always 
assigned blame to both sides.) The two marker types distinguish between rated 
and unrated players. Although unrated players have a value for their rating, it 
is an indication given by the player and not by the server. Only after playing 
sufficiently many games the server assigns players a rating. 

Although a significant number of games are scored incorrectly this is usually 
not considered a problem when the winner is correct. (Players typically forget 
to remove some stones when they are far ahead.) Figure 3 shows how often 
incorrect scoring by rated players converts a win to a loss. 

It should be noted that the percentages in Figures 2 and 3 were weighted over 
all games regardless of the player. Therefore they do not necessarily reflect the 
probabilities for individual players, i.e., the statistics can be dominated by a 
small group of players that played many games. This group at least contains 
some computer players which have a tendency to get robbed of their points 




152 



E.C.D. van der Werf, H.J. van den Herik, J.W.H.M. Uiterwijk 




Figure 2. Incorrect scores. 




Figure 3. Incorrect winners. 



in the scoring phase. We therefore also calculated some statistics that were 
normalised over individual players. For rated players the average probability 
of scoring a game incorrectly is 4.2%, the probability of cheating (the incorrect 
score converts loss to win) is 0.66 %, and the probability of getting cheated is 
0.55%. For unrated players the average probability of scoring a game incorrectly 
is 11.2%, the probability of cheating is 2.1 %, and the probability of getting 
cheated is 1.1%. The fact that the probability of getting cheated is lower than 
the probability of cheating is the result of a small group of players (several of 
which are computer programs) that systematically lose points in the scoring 
phase, and a larger group of players that take advantage of them. 

6. Experiments 

In this section experimental results are presented for: ( 1 ) selecting a classifier, 
(2) performance of the representation, (3) recursive performance, (4) full board 
performance, and (5) performance on the 19 x 19 board. Unless stated otherwise 
the various training and validation sets, used in the experiments, were extracted 
from games played between 1996 and 2002. The test set was always the same, 
containing 7149 labelled blocks extracted from 919 games played in 1995. 

6.1 Selecting a Classifier 

An important choice is selecting a good classifier. In pattern recognition there 
is a wide range of classifiers to choose from (Jain et al., 2000). We tested a 
number of well-know classifiers for their performance on datasets of 100, 1000, 
and 10000 examples. The classifiers are: Nearest Mean Classifier (NMC), 
Linear Discriminant Classifier (LDC), Logistic Linear Classifier (LOGLC), 
Quadratic Discriminant Classifier (QDC), Nearest Neighbour Classifier (NNC), 
K-Nearest Neighbours Classifier (KNNC), BackPropagation Neural net Clas- 
sifier with momentum and adaptive learning (BPNC), Levenberg-Marquardt 
Neural net Classifier (LMNC), and RProp Neural net Classifier (RPNC). Some 
preliminary experiments with a Support Vector Classifier, Decision Tree Clas- 
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sifters, a Parzen classifier and a Radial Basis Neural net Classifier were not 
pursued further because of excessive training times and/or poor performance. 
All classifiers except the neural net classifiers, for which we directly used the 
standard matlab toolbox, were used as implemented in PRTools3 (Duin, 2000). 

The results, shown in Table 1, indicate that performance first of all depends 
on the size of the training set. The linear classifiers perform better than the 
quadratic classifier and nearest neighbour classifiers. For large datasets train- 
ing KNNC is very slow because it takes a long time to find an optimal value of 
the parameter k. The number of classifications per second of (K)NNC is also 
low because of the large number of distances that must be computed (all train- 
ing examples are stored). Although the performance of the nearest neighbour 
classifiers might be improved by editing and condensing the dataset, we did not 
investigate them further. 



Classifier 


Training size 


Training error 
(%) 


Test error 
(%) 


Training time 
(s) 


Classi. speed 
is' 1 ) 


NMC 


100 


2.8 


3.9 


0.0 


4.9 x 10 4 




1,000 


4.0 


3.8 


0.1 


5.2 x 10 4 




10,000 


3.8 


3.6 


0.5 


5.3 x 10 4 


LDC 


100 


0.7 


3.0 


0.0 


5.1 x 10 4 




1,000 


2.1 


2.0 


0.1 


5.2 x 10 4 




10,000 


2.2 


1.9 


0.9 


5.3 x 10 4 


LOGLC 


100 


0.0 


9.3 


0.2 


5.2 x 10 4 




1,000 


0.0 


2.6 


1.1 


5.2 x 10 4 




10,000 


1.0 


1.2 


5.6 


5.1 x 10 4 


QDC 


100 


0.0 


13.7 


0.1 


3.1 x 10 4 




1,000 


1.0 


2.1 


0.1 


3.2 x 10 4 




10,000 


1.9 


2.1 


1.1 


3.2 x 10 4 


NNC 


100 


0.0 


18.8 


0.0 


4.7 x 10 3 




1,000 


0.0 


13.5 


4.1 


2.4 x 10 2 




10,000 


0.0 


10.2 


4.1 x 10 3 , 


2.4 x 10° 


KNNC 


100 


7.2 


13.1 


0.0 


4.8 x 10 3 




1,000 


4.2 


4.4 


1.0 x 10 1 


2.4 x 10 2 




10,000 


2.8 


2.8 


9.4 x 10 3 


2.6 x 10° 


BPNC 


100 


0.5 


3.6 


2.9 


1.8 x 10 4 




1,000 


0.2 


1.5 


1.9 x 10 1 


1.8 x 10 4 




10,000 


0.5 


1.0 


1.9 x 10 2 


1.9 x 10 4 


LMNC 


100 


2.2 


7.6 


2.6 x 10 1 


1.8 x 10 4 




1,000 


0.7 


2.8 


3.2 x 10 2 


1.8 x 10 4 




10,000 


0.5 


1.2 


2.4 x 10 3 


1.9 x 10 4 


RPNC 


100 


1.5 


4.1 


1.4 


1.8 x 10 4 




1,000 


0.2 


1.7 


7.1 


1.8 x 10 4 




10,000 


0.4 


1.1 


7.1 x 10 1 


1.9 x 10 4 



Table 1. Performance of classifiers. 
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The best classifiers are the neural network classifiers. It should however be 
noted that their performance may be slightly overestimated with respect to the 
size of the training set, because we used an additional validation set to stop 
training (this was not possible for the other classifiers because they are not 
trained incrementally). The Logistic Linear Classifier performs nearly as good 
as the neural network classifiers, which is quite an achievement considering 
that it is just a linear classifier. 

The results of Table 1 were obtained with neural networks that employed one 
hidden layer containing 15 neurons with hyperbolic tangent sigmoid transfer 
functions. Since our choice for 15 neurons was quite arbitrary a second exper- 
iment was performed in which we varied the number of neurons in the hidden 
layer. In Figure 4 results are shown for the RPNC. The classification errors 
marked with triangles represent results for training on 5,000 examples, the stars 
indicate results for training on 1 5 ,000 examples . The solid lines are measured on 
the independent test set, whereas 
the dash-dotted lines are obtained 
on the training set. The results 
show that even moderately sized 
networks easily overfit the data. 

Although the performance initially 
improves with the size of the net- 
work, it seems to level off for net- 
works with over 50 hidden neurons 
(the standard deviation is around 
0. 1 %). Again clearly the key fac- 
tor in improving performance is in 
increasing the training set. 

6.2 Performance of the Representation 

In section 4 we claimed that a raw board representation is inefficient for 
predicting life and death. To validate this claim we measured the performance 
of such a representation and compared it to our specialised representation. 

The raw representation consists of three concatenated bitboards, for which 
the first bitboard contains the block to be classified, the second bitboard contains 
other friendly blocks and the third bitboard contains the enemy blocks. To 
remove symmetry the bitboards are rotated such that the centre of mass of the 
block to be classified is always in a single canonical region. 

Since high-dimensional feature spaces tend to raise several problems which 
are not directly caused by the quality of the individual features we also tested two 
compressed representations. These compressed representations were generated 
by performing Principal Component Analysis (PCA) on the raw representation. 




Figure 4. Sizing the neural network for the RPNC. 
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For the first PCA mapping the number of features was chosen identical to our 
specialised representation. For the second PCA mapping the number of features 
was set to preserve 90% of the total variance. 

The results, shown in Table 2, are obtained for the RPNC with 15, 35, and 75 
neurons in the hidden layer, for training sets with 100, 1,000 and 10,000 exam- 
ples. All values are averages over 1 1 runs with different training sets, validation 
sets (same size as the training set), and random initialisations. The errors, mea- 
sured on the test set, indicate that a raw representation alone requires too many 
training examples to be useful in practice. Even with 10,000 training examples 
the raw representation performs much weaker than our specialised representa- 
tion with only 100 training examples. Simple feature-extraction methods such 
as Principal Component Analysis do not seem to improve performance, indicat- 
ing that preserved variance of the raw representation is relatively insignificant 
for determining life and death. 



Training Size 


Extractor 


Test error 
15 neurons 
(%) 


Test error 
35 neurons 
(%) 


Test error 
75 neurons 
(%) 


100 


- 


29.1 


26.0 


27.3 


100 


peal 


22.9 


22.9 


22.3 


100 


pca2 


23.3 


24.3 


21.9 


1000 


- 


13.7 


13.5 


13.4 


1000 


peal 


16.7 


16.2 


15.6 


1000 


pca2 


14.2 


14.5 


14.4 


10000 


- 


7.5 


6.8 


6.5 


10000 


peal 


9.9 


9.3 


9.1 


10000 


pca2 


8.9 


8.2 


7.7 



Table 2. Performance of the raw representation. 



6.3 Recursive Performance 

Our recursive framework for classification is implemented as a cascade of 
classifiers which use extra features, based on previous predictions as discussed 
in subsection 4.2, as additional input. The performance measured on an inde- 
pendent test set for the first 4 steps is shown for various sizes of the training 
set in Table 3. The results are averages of 5 runs with randomly initialised 
networks containing 50 neurons in the hidden layer (the standard deviation is 
around 0.1 %). 

The results show that recursive predictions improve the performance. How- 
ever, the only significant improvement comes from the first iteration. The im- 
provements are by far not significant for the average 3- and 4-step errors. The 
reason for this is that sometimes the performance got stuck or even worsened 
after the first iteration. Preliminary experiments suggest that large networks 
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were more likely to get stuck after the first iteration than small networks, which 
might indicate some kind of overfitting. A possible solution to overcome this 
problem is to retrain the networks a number of times, and pick the best based on 
the performance on the validation set. If we do this the best networks, trained 
on 100,000 training examples, achieve a 4-step error of 0.25%. 



Training Size 


Direct error 
(%) 


2-step error 
(%) 


3-step error 
(%) 


4-step error 
(%) 


1,000 


1.93 


1.60 


1.52 


1.48 


10,000 


1.09 


0.76 


0.74 


0.72 


100,000 


0.68 


0.43 


0.38 


0.37 



Table 3. Recursive performance. 



6.4 Full Board Performance 

So far we have concentrated on the percentage of blocks that are classified 
correctly. Although this is an important measure it does not directly tell us how 
often boards will be scored correctly (a board may contain multiple incorrectly 
classified blocks). Further we do not yet know what the effect is on the score 
in number of board points. Therefore we tested our classifier on the full-board 
test positions (which were not used for training or validation). 

For our best 4-step classifier trained with 100,000 examples we found that 
1.1% of the boards were scored incorrectly. For 0.5% of the boards the winner 
was not identified correctly. The average number of incorrectly scored board 
points (using distance-based scoring) was 0.15, however in case a board is 
scored incorrectly this usually affects around 14 board points (which counts 
double in the numeric score). 

6.5 Performance on the 19 x 19 Board 

The experiments presented above were all performed on the 9 x 9 board 
which, as was pointed out before, is a most challenging environment. Never- 
theless, it is interesting to test whether the techniques scale up to the 19 x 19 
board. So far we did not have the time to label large quantities of 19 x 19 
games. So, training directly on the 19 x 19 board was not an option. Despite of 
this we tested our classifiers, which were trained from blocks observed on the 
9x9 board, on the problem set IGS SI -counted from the Computer Go Test 
Collection. This set contains 31 labelled 19 x 19 games played by amateur dan 
players, and was used by Muller (1997). On the 31 final positions our 4-step 
classifier classified 5 blocks incorrectly (0.5% of all relevant blocks), and as a 
consequence 2 final positions were scored incorrectly. The average number of 
incorrectly scored board points was 2.1 (0.6%). 
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In his paper Muller stated that heuristic classification of his program Ex- 
plorer classified most blocks correctly. Although we do not know the exact 
performance of Explorer we believe it is safe to say that our system, which 
scored 99.4% of all board points correctly, is performing at least at a compa- 
rable level. Furthermore, since our system was not even trained explicitly for 
19 x 19 games there may still be significant room for improvement. 

7. Conclusions 

We have developed a system that learns to score final positions from labelled 
examples. On unseen game records our system scored around 98.9% of the 
positions correctly without any human intervention. Compared to the average 
rated player on NNGS (who for scored 9x9 games has a rating of 7 kyu) our 
system is more accurate at removing all dead blocks, and performs comparable 
on determining the correct winner. 

By comparing numeric scores and counting unsettled interior points we can 
efficiently detect nearly all incorrectly scored final positions. Although some 
final positions are assessed incorrectly by our classifier, most are in fact scored 
incorrectly by the players. Detecting these games is important because most 
machine-learning methods require reliable training data for good performance. 

7.1 Future Work 

By providing reliable score information our system opens the large source of 
Go knowledge which is implicitly available in human game records. The next 
step will be to apply machine learning in non-final positions. We believe that 
the representation and techniques presented in this paper provide a solid basis 
for static predictions in non-final positions. 

The good performance of our system was obtained without any search, indi- 
cating that static evaluation is sufficient for most human final positions. Nev- 
ertheless, we believe that some (selective) search can still improve the perfor- 
mance. Adding selective features that involve search and integrating our system 
into Magog, our 9x9 Go program, will be an important next step. 

Although the performance of our system is already quite good for labelling 
game records, there are, at least in theory, still positions which may be scored 
incorrectly when our classifier makes the same mistake as the human players. 
Future work should determine how often this happens in practice. 

Another point where our system can be improved is the representation. Al- 
though the current representation performs adequately, some features may be 
redundant or correlated. Feature extraction, feature selection, and possibly 
adding some new features may improve performance even further. 
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Abstract We describe two Go programs, Olga and Oleg, developed by a Monte-Carlo 
approach that is simpler than Bruegmann’s (1993) approach. Our method is 
based on Abramson (1990). We performed experiments, to assess ideas on (1) 
progressive pruning, (2) all moves as first heuristic, (3) temperature, (4) simu- 
lated annealing, and (5) depth-two tree search within the Monte-Carlo frame- 
work. Progressive pruning and the all moves as first heuristic are good speed-up 
enhancements that do not deteriorate the level of the program too much. Then, 
using a constant temperature is an adequate and simple heuristic that is about as 
good as simulated annealing. The depth-two heuristic gives deceptive results at 
the moment. The results of our Monte-Carlo programs against knowledge-based 
programs on 9x9 boards are promising. Finally, the ever-increasing power of 
computers lead us to think that Monte-Carlo approaches are worth considering 
for computer Go in the future. 

Keywords: Monte-Carlo approach, computer Go, heuristics 

1. Introduction 

We start with two observations. First, when termination of the search process 
of a game tree is possible, the process provides the best move and constitutes 
a proof on that best move. The process does not necessarily need domain- 
dependent knowledge but its cost is exponential in the search depth. Second, a 
domain-dependent move generator generally yields a good move, but without 
any verification. It costs nothing in execution time but the move generator 
remains incomplete and always contains errors. When considering the game of 
Go, these two observations are crucial. Global tree search is not possible in Go 
and knowledge-based Go programs are very difficult to improve (Bouzy and 
Cazenave, 2001). Therefore, this paper explores an intermediate approach in 




160 



B. Bouzy, B. Helmstetter 



which a Go program performs a global search (not a global tree search) using 
very little knowledge. This approach is based on statistics or more specifically, 
on Monte-Carlo methods. We believe that such an approach does neither have 
the drawback of global tree search with very little domain-dependent knowledge 
(no termination), nor the drawback of domain-dependent move generation (no 
verification). The statistical global search described in this paper terminates and 
provides the move with a kind of verification. In this context, the paper claims 
the adequacy of statistical methods, or Monte-Carlo methods, to the game of Go. 

To support our view, Section 2 describes related work about Monte-Carlo 
methods applied to Go. Section 3 focuses on the main ideas underlying our 
work. Then, Section 4 highlights the experiments to validate these ideas. Before 
conclusion, Section 5 discusses the relative merits of the statistical approach 
and its variants along with promising perspectives. 

2. Related Work 

At a practical level, the general meaning of Monte Carlo lies in the use of 
the random generator function, and for the theoretical level we refer to Fishman 
(1996). Monte-Carlo methods have already been used in computer games. In 
incomplete information games, such as poker (Billings et al., 2002), scrabble 
(Sheppard, 2002), and backgammon (Tesauro, 2002), this approach is natural: 
because the information possessed by your opponent is hidden, you want to sim- 
ulate this information. In complete information games, the idea of replacing 
complete information by randomized information is less natural. Nevertheless, 
it is not the first time that Monte-Carlo methods have been tried in complete 
information games. This section deals with two previous contributions (Abram- 
son, 1990; Bruegmann, 1993). 

2.1 Abramson’s Expected-Outcome 

Evaluating a position of a two-person complete information game with statis- 
tics was tried by Abramson (1990). He proposed the expected-outcome model, 
in which the proper evaluation of a game-tree node is the expected value of the 
game’s outcome given random play from that node on. The author showed that 
the expected outcome is a powerful heuristic. He concluded that the expected- 
outcome model of two-player games is “precise, accurate, easily estimable, ef- 
ficiently calculable, and domain-independent”. In 1990, he tried the expected- 
outcome model on the game of 6x6 Othello. The ever-increasing computer 
power enables us to use this model now for Go programs. 

2.2 Bruegmann’s Monte-Carlo Go 

Bruegmann (1993) was the first to develop a Go program based on random 
games. The architecture of the program, Gobble, was remarkably simple. In 
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order to choose a move in a given position, Gobble played a large number of 
almost random games from this position to the end, and scored them. Then, he 
evaluated a move by computing the average of the scores of the random games 
in which it had been played. 

This idea is the basis of our work. Below we describe some issues of Gob- 
ble. In our work, described in Section 3, they are subject to improvements. 

1 No filling of the eyes. Moves that filled one’s eyes were forbidden. This 
was the sole domain-dependent knowledge used in Gobble. In the game 
of Go, the groups must have at least two eyes in order to be alive (with 
the relatively rare exception of groups living in seki). If the eyes could 
be filled, the groups would never live and the random games would not 
actually finish. However, the exact definition of an eye has its importance. 

2 Evaluation of the moves. Moves were evaluated according to the average 
score of the games in which they were played, not only at the beginning 
but at any stage of the game, provided that it was the first time one player 
had played at the intersection. This was justified by the fact that moves are 
often good independently of the stage at which they are played. However, 
this can turn out to be a fairly dangerous assumption. 

3 Selection of the moves. Moves were not chosen completely randomly, 
but rather on their current evaluation, good moves having more chances 
to be played first. Moreover, simulated annealing was used to control 
the probability that a move could be played out of order. The amount of 
randomness put in the games was controlled by the temperature ; it was set 
high at the beginning and gradually decreased. Thus, in the beginning, the 
games were almost completely random, and at the end they were almost 
completely determined by the evaluations of the moves. However, we 
will see that both are possible: (1) to fix the temperature to a constant 
value, and (2) to make the temperature even infinite, which means that 
all moves are played with equal probability. 

3. Our Work 

This section first describes the basic idea underlying our work (Subsection 
3.1). Then, it presents our Go programs, Olga and Oleg (Subsection 3.2). 
The only important domain-dependent consideration of the method, the defini- 
tion of eyes, is described in Subsection 3.3. Finally, in Subsection 3.4 a graph 
explaining the various possible enhancements to the basic idea is given. 

3.1 Basic Idea 

Though the architecture of the Gobble program was particularly simple, 
some points were subject to discussion. Our own algorithm for Monte-Carlo Go 
programs is an adaptation of Abramson’s (1990). The basic idea is: to evaluate 
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a position by playing a given number of completely random games to the end - 
without filling the eyes - and then scoring them. The evaluation corresponds to 
the mean of the scores of those random games. Choosing a move in a position 
means playing each of the moves and maximize the evaluations of the positions 
obtained at depth 1. 

3.2 Two Programs: OLGA and OLEG 

We developed two Go programs based on the basic idea above: Olga and 
Oleg. Olga and Oleg are far-fetched French acronyms for “ALeatoire GO” 
or “aLEatoire GO” that mean random Go. Olga was developed by Bouzy 
(2002) as a continuation of the Indigo development. The main idea was to use 
an approach with very little domain-dependent knowledge. At the beginning, 
the second idea in the Olga development was to concentrate on the speed of 
the updating of the objects relative to the rules of the game, which was not 
highlighted in the previous developments of Indigo. Of course, Olga uses 
code available in Indigo. 

Oleg was written by Helmstetter. Here, the main idea was to reproduce 
the Monte-Carlo Go experiments of Bruegmann (1993) to obtain a Go program 
with very little Go knowledge. Oleg uses the basic data structure of GnuGo 
that is already very well optimized by the GnuGo team (Bump, 2003). 

Both in Oleg and in Olga, the quality of play depends on the precision 
expected that varies with the number of tests performed. The time to carry out 
these tests is proportional to the time spent to play one random game. On a 2 
GHz computer, Olga plays 7,000 random 9x9 games per second and Oleg 
10 , 000 . 

Because strings, liberties, and intersection accessibilities are updated incre- 
mentally during the random games, the number of moves per second is almost 
constant and the time to play a game is proportional to the board size. Since 
the precision of the expected value depends on the square of the number of 
random games, there is no need to gain 20 per cent in speed, which would only 
bring about a 10-per-cent improvement in the precision. However, optimizing 
the program very roughly is important. A first pass of optimizations can gain a 
ratio of 10, and the precision can be three times better in such a case, which is 
worthwhile. 

Olga and Oleg share the basic idea and most of the enhancements that are 
described in Subsection 3.4. They are used to test the relative merits of each 
enhancement. However, each program uses its own definition of eyes. 

3.3 How to Define Eyes? 

The only domain-dependent knowledge required is the definition of an eye. 
It is important for the random program not to play a move in an eye. Without 
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this rule, the random player would never make living groups and the games 
would never end. There are different ways to define “eyes” as precisely as 
possible with domain-dependent knowledge such as Fotland (2002) and Chen 
and Chen (1999). Our definitions are designed to be integrated into a random 
Go-playing program; they are simple and fast but not correct in some cases. 

In OLGA, an eye is an empty intersection surrounded by stones of one colour 
with two liberties or more. 

In Oleg, an eye is an empty intersection surrounded by stones belonging to 
the same string. 

The upside of both definitions is the speed of the programs. Oleg’s defini- 
tion is simpler and faster than Olga’s. Both approaches have the downside of 
being wrong in some cases. Oleg’s definition is very restrictive: Oleg’s eyes 
are actual true eyes but it may fill an actual eye surrounded by more than one 
string. Besides, Olga has a fuzzy and optimistic definition: it never fills an 
actual eye but, to connect its stones surrounding an Olga’s eye, Olga always 
expects one adjacent stone to be put into atari. 

3.4 Various Possible Enhancements 

So far, we have identified a few possible enhancements from the basic idea. 
They are shown in Figure 1. This figure also shows the enhancements used by 
Oleg and Olga in their standard configurations. Two of the enhancements 
were already present in Gobble, namely the all moves as first heuristic (which 
means making statistics not only for the first move but for all moves of the ran- 
dom games) and simulated annealing. For the latter, an intermediate possibility 
can be adopted: instead of making the temperature vary during the game, we 
make it constant. 

With a view of speeding up the basic idea, an alternative to the all-moves- 
as-first heuristic is progressive pruning of which only the first move of the 
random games is taken into account for the statistics, and the moves of which 
the evaluation is too low compared to the best move are pruned. 

Making a minimax at depth 2 and evaluating the positions by making random 
games from this position naturally evolves from the basic idea. The expected 
result is an improvement of the program reading ability. For instance, it would 
suppress moves that work well only when the opponent does not respond. 

4. Experiments 

Starting from the basic idea, this section describes and evaluates the various 
enhancements: progressive pruning, all-moves-as-first heuristic, temperature, 
simulated annealing, and depth-two enhancements. 

For each enhancement, we set up experiments to assess its effect on the level 
of our programs. One experiment consists in a match of 100 games between the 
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Figure 1. Possible enhancements. 

program to be assessed and the experiment reference program, each program 
playing 50 games with Black. In most experiments, the program to be assessed 
is a program in which one parameter varies, and the reference program is the 
same program with the parameter fixed to a reference value. In the other set 
of experiments, the program to be assessed uses the enhancement while the 
reference program does not. The result of an experiment is generally a set of 
relative scores provided by a table assuming that the program of the column is 
the max player. Given that the standard deviation of 9x9 games played by our 
programs is roughly 15 points, 100 games enable our experiments to lower a 
down to 1.5 points and to obtain a 95% confidence interval of which the radius 
equals 2a, i.e., 3 points. We have used 2 GHz computers. When the response 
time of the assessed program varies with the experimental parameters, we men- 
tion it. Furthermore, all programs in this work do not use any conservative or 
aggressive style depending on who is ahead in a game, they only try to maxi- 
mize their own score. The score of a game is more significant than the winning 
percentage which is consequently not included in the experiments’ results. We 
terminate this section with an assessment of Olga and Oleg against two ex- 
isting knowledge-based programs Indigo and Gnugo, in showing the results 
of an all-against-all tournament. 

4.1 Progressive Pruning 

As contained in the basic idea, each move has a mean value m, a standard 
deviation a, a left expected outcome mi and a right expected outcome m r . 
For a move, mi = m - or d and m r = m + arj. is a ratio fixed up by 
practical experiments. A move M\ is said to be statistically inferior to another 
move M 2 if M\.m r < M 2 . mi. Two moves M\ and M 2 are statistically equal 
when M\.a<a e and M 2 -a<a e and no move is statistically inferior to the other. 
a e is called standard deviation for equality, and its value is determined by 
experiments. 
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In Progressive Pruning (PP), after a minimal number of random games (100 
per move), a move is pruned as soon as it is statistically inferior to another 
move. Therefore, the number of candidate moves decreases while the process 
is running. The process stops either when there is only one move left (this 
move is selected), or when the moves left are statistically equal, or when a 
maximal threshold of iterations is reached. In these two cases, the move with 
the highest expected outcome is chosen. The maximal threshold is fixed to 
10,000 multiplied by the number of legal moves. This progressive pruning 
algorithm is similar to the one described in Billings et al. (2002). 

Due to the increasing precision of mean evaluations while the process is 
running, the max value of the root is decreasing. Consequently, a move can 
be statistically inferior to the best one at a given time and not later. Thus, the 
pruning process can be either hard (a pruned move cannot be a candidate later 
on) or soft (a move pruned at a given time can be a candidate later on). Of 
course, soft PP is more precise than hard PP. Nevertheless, in the experiments 
shown here, Olga uses hard PP. 



The inferiority of one move compared to 
another, used for pruning, depends on the 
value of r,i . Theoretically, the greater r</ 
is, the less pruned the moves are, and, as a 
consequence, the better the algorithm per- 
forms, but the slower it plays. The equality 
of moves, used to stop the algorithm, is con- 
ditioned by o e . Theoretically, the smaller 
cr e is, the fewer equalities there are, and the 
better the algorithm plays but with an in- 
creased slowness. We set up experiments 
with different versions of Olga to obtain 
the best compromise between the time and 
the level of the program. The first set of experiments consisted 1 in assessing the 
level and speed of Olga depending on rj. OLGA(r^) played a set of games 
either with black or white against OLGA(r^=l). Table 1 shows the mean of the 
relative score of OLGA(r^) when r ( i varies from 1 up to 8. Both the minimal 
number of random games and the maximal threshold remain constant (100 and 
10,000 respectively). 

This experiment shows that r c i plays an important role in the move pruning 
process. Large values of correspond to the basic idea. To sum up, progressive 

pruning loses little strength compared to the basic idea, between five or ten 
points according to the value of r</. In the next experiments, is set to 1. The 
second set of experiments deals with o e in the same way. Table 2 shows the 
mean of the relative score of OLGA(<r e ) when o e varies from 0.2 up to 1. 
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OLGA(cr e =l) yields the worst score while using less time. This experiment 
confirms the role played by a e in the move pruning process. In the next exper- 
iments, cr e is set to 0.2. 

4.2 The All-Moves- As-First Heuristic 

When evaluating the terminal position of a given random game, this terminal 
position may be the terminal position of many other random games in which 
the first move and another friendly move of the random game are reversed. 
Therefore, when playing and scoring a random game, we may use the result 
either for the first move of the game only, or for all moves played in the game as 
if they were the first to be played. The former is the basic idea, the latter is what 
was performed in Gobble, and we use the term all moves as first heuristic. 

4.2.1 Advantages and Drawbacks. The idea is attractive, because one 
random game helps evaluate almost all possible moves at the root. However, 
it does have some drawbacks because the evaluation of a move from a random 
game in which it was played at a late stage is less reliable than when it is played at 
an early stage. This phenomenon happens when captures have already occurred 
at the time when the move is played. In figure 2 the values of the moves A for 
Black and B for White largely depend on the order in which they are played. 

There might be more efficient ways to 
analyse a random game and decide whether 
the value of a move is the same as if it was 
played at the root. Thus, we would ob- 
tain the best of both worlds: efficiency and 
reliability. To this end, at least one easy 
thing should be done (it has already been 
done in Gobble and in Oleg): in a ran- 
dom game, if several moves are played at 
the same place because of captures, modify 
the statistics only for the player who played 
first. 

The method has another troublesome 
side-effect: it does not evaluate the value 
of an intersection for the player to move but father the difference between the 
values of the intersection when it is played by each player. Indeed, in most 
random games, any intersection will be played either by one player or the other, 
with an equal probability of about 1 /2 (an intersection is almost always played 
at least once during a random game). Therefore, the average score of all random 
games lies approximately in the middle between the average score when White 
has played a move and the average score when Black has played a move. Most 
often, this problem is not serious, because the value of a move for one player 




Figure 2. The move order is important. 




Figure 3. The value of moves may be 
very different for both players. 
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is often the same for both players; but sometimes it is the opposite. In Figure 3 
the point C is good for White and bad for Black. On the contrary D and E are 
good for Black only. 



4.2.2 Experimental Comparison with Progressive Pruning. Com- 
pared to the very slow basic idea the gain in speed offered by the all-moves-as- 
first heuristic is very important. In contrast to the basic idea or PP, the number 
of random games to be played becomes independent of the number of legal 
moves. This is the main feature of this heuristic. Instead of playing a 9x9 game 
in more than two hours by using the basic idea, Olga plays in five minutes 
with the use of this heuristic. However, we have seen two problems due to the 
use of this heuristic. Therefore, how do the uses of all moves as first heuristic 
and progressive pruning compare in strength? 

Table 3 shows the mean of the relative scores of OLGA(Basic idea) and 
Olga(PP) against OLGA(all moves as first). 

While the previous section underlines that PP 
decreases the level of Olga by about five or 
ten points according to the value of r ( /, the all- 




Table 3. Relative scores of 
Olga with the basic idea or with 
PP, against the all-moves-as-first 
heuristic. 



moves-as-first heuristic decreases the level by al- 
most fifteen points. The confrontation between 
Olga(PP) and OLGA(all moves as first) shows 
that PP remains better in strength. 



4.2.3 Influence of the Number of Random Games. The standard de- 
viation a of the random games usually amounts to 45 points at the beginning 
and in the middle game, and diminishes in the endgame. If we play N random 
games and take the average, the standard deviation is a/y/N. This calcula- 
tion helps find how many random games to play so that the evaluations of the 
moves become sufficiently close to their expected outcome. From a practical 

point of view the question is: how does this re- 
late to the level of play? Table 4 shows the re- 
sult of Oleg (N = 10, 000) against OLEG(iV = 
Table 4. Relative scores of Oleg 1000) and OLEG(iV = 100, 000). 
with different values of N, against \y e can conclude that 10,000 random games 

per move is a good compromise when using the 
all-moves-as-first heuristic. Since Oleg is able to play 10,000 random games 
per second, this means it can play one move per second while using only this 
heuristic. 




4.3 Temperature 

Instead of making the temperature start high and decrease as we play more 
random games, it is simpler to make it a constant. The temperature has been 
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implemented in Oleg in a somewhat different way as in Gobble. In the 
latter, two lists of moves were maintained for both players, and the moves in 
the random games were played in the order of the lists (if the move in the list is 
not legal, we just take the next in the list). Between each random game, the lists 
were sorted according to the current evaluation of the moves and then moves 
were shifted in the list with a probability depending on the temperature. 

In Oleg, in order to choose a move in a random game, we consider all the 
legal moves and play one of them with a probability proportional to 

exp(IG;), 

where v is the current evaluation of the move and K a constant which must be 
seen as the inverse of the temperature (K = 0 means T = oo). A drawback of 
this method is that it slows down the speed of the random games to about 2,000 
per second. Table 5 shows the results of OlegCR” = 2) against Oleg(A) for 
a few values of K. 

So, there is indeed something to be gained 
by using a constant temperature. This is 
probably because the best moves are played 

,, c early and thus, obtain a more accurate evalu- 

Table 5. Relative scores of Oleg . TI . . , . . ,, _ , 

with different values of K against ation. However, tt ts bad to have K too large. 

Oleg(jRT= 2). The best we have found is K = 5. 

4.4 Simulated Annealing 

Simulated annealing (Kirkpatrick, Gelatt, and Vecchi, 1983) was presented 
in Bruegmann (1993) as the main idea of the method. We have seen that 
it is perfectly possible not to use it, so the question arises: what is its real 
contribution? 

To answer the question we performed some experiments with simulated an- 
nealing in Oleg. In our implementation the variable K increases as more 
random games are played. However, we have not been able to achieve sig- 
nificantly better results this way than with K set to a constant. For example, 
we have made an experiment between Oleg with simulated annealing and 
K varying from 0 to 5, and Oleg with K = 5. The version with simulated 
annealing won by 1.6 points in average. 

The motivation for using simulated annealing was probably that the program 
would gain some reading ability, but we have not seen any evidence of this, 
the program making the same kind of tactical blunders. Besides, the way 
simulated annealing is implemented in Gobble is not classical. Simulated 
annealing normally has an evaluation that depends only on the current state (in 
the case of Gobble, a state is the lists of moves for both players); instead 
in Gobble the evaluation of a state is the average of all the random games 
that are based on all the states reached so far. There may be a way to design a 
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true simulated-annealing-based Go program, but speed would, then, be a major 
concern. 

4.4.1 OLEG against VEGOS. Vegos is a recent go program available 
on the web (Kaminski, 2003). It is based on the same ideas as Gobble; 
particularly it uses simulated annealing. A confrontation of 20 games against 
OlegCA = 2, without simulated annealing) has resulted in an average win of 

7.5 points for Oleg. We did not perform more games because we had to play 
them by hand. The playing styles of the programs are similar, with slightly 
different tactical weaknesses. The result of this confrontation is another reason 
why we doubt that simulated annealing is crucial for Monte-Carlo Go. 

4.5 Depth-2 Enhancement 

For the depth-2 enhancement the given position is the root of a depth-two 
min-max tree. Let us start the random games from the root by two given moves, 
one move for the friendly side, and, then, one move for the opponent, and make 
statistics on the terminal position evaluation for each node situated at depth 2 
in the min-max tree. At depth-one nodes, the value is computed by using the 
min rule. When a depth-one value has been proved to be inferior to another 
one, then this move is pruned, and no more random games are started with this 
move first. This variant is more complex in time because, if n is the number 
of possible moves, about n 2 statistical variables must be sampled, instead of n 
only. 

We set up a match between two versions of Olga using progressive pruning 
at the root node. OhGA(Depth=l) backs up the statistics about random games 
at depth one while OhGA(Depth=2) backs up the statistics at depth two and uses 
the min rule to obtain the value of depth-one nodes . The values of the parameters 
of OLGA(Depth=l) are the same as the parameters of the PP program. The 
minimal number of random games without pruning is set to 100. The maximal 
number of random games is also fixed to 10,000 multiplied by the number of 
legal moves, is set to 1, and a e is set to 0.2. While OLGA(Z)ep£h=l) only 
uses 10’ per 9x9 game, OhGA(Depth=2) is very slow. In order to speed up 
OhGA(Depth=2), we use the all moves as first heuristic. Thus, it uses about 2 
hours per 9x9 game, which yields results in a reasonable time. 

Table 6 shows the mean of the relative score of Prog(Depth=2) against 
Prog(Depth=l), Prog being either Olga or Oleg. 

Intuitively, the results should be better for the depth- 
two programs, but they are actually slightly worse. How 
can this be explained? 

Table 6. Relative scores The first possible explanation lies in the min-max 
of Piog(Depth=2) against osc iH a ti on observed at the root node when performing 
iterative deepening. A depth-one search overestimates 
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the min-max value of the root while a depth-two search underestimates the min- 
max value. Thus, the depth-two min-max value of the root node is more difficult 
to separate from the evaluation of the root (also obtained with random simu- 
lations) than the depth-one min-max value is. In this view, Olg \(D epth— 2) 
pass on some positions on which OLGA(Depth=l) does not. In order to ob- 
tain an answer to the validity of this explanation, a depth-three experiment 
becomes mandatory. If depth three performs well, then the explanation should 
be reinforced, otherwise another explanation is needed. 

The second explanation is statistical. Let Z be a random variable which is 
the maximum of 10 identical random variables X* (0 < i < 9) with mean(Xj) 
= 0 and standard deviation cr(Xj ) = 1, plus a last one Y with mean(Y) = 8 > 0 
and standard deviation cr(Y) = 1. We have Z = max(Xo, ..., Xg, Y ). Table 7 
provides the mean and standard deviation of Z. 
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Table 7. Mean and standard deviation of Z with 
different values of S. 



Table 7 shows that, on posi- 
tions in which all 11 moves are 
equals (<5 = 0), performing a max 
(resp. min) leads to a positive 
(resp. negative) value (1.58) sig- 
nificantly greater (resp. smaller) 
than the (resp. opposite of the) standard deviation of each move (1). There- 
fore, when performing a depth-two search, the depth-one nodes are largely 
underestimated and, given these depth-one estimations, the root node is largely 
overestimated. Thus, when the number of games is not sufficient, the error prop- 
agates once in the negative direction and then in the positive one. To sum up, 
when the moves are almost equal, the min-max value at the root node contains 
a great deal of randomness. 

Table 7 also points out another explanation. When 6 < 2, mean (Z) and 
<t(Z) remain quite different from 8 and 1 respectively. But when 6 > 4, both 
mean(Z) and a(Z) are almost equal to 8 and 1 respectively. Thus, on positions 
with one best move only and ten average moves, the mean value of the max value 
becomes exact only when the difference between the best move evaluation and 
the other move evaluation is about four times the value of the standard deviation 
of the move evaluations. 

These two remarks show that, when using the depth-two enhancement, a 
great deal of uncertainty is contained in the min value of depth-one nodes and 
even more in the min-max value of the root node. 



4.6 An All-against-All Tournament 

To evaluate the Monte-Carlo approach against the knowledge-based ap- 
proach, this subsection provides the results of an all-against-all 9x9 tourna- 
ment between Olga, Oleg, Indigo and GnuGo. GnuGo (Bump, 2003) 
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is a knowledge-based Go program developed by the Free Software Foundation. 
We used the 3.2 version released in April 2002. Indigo2002 (Bouzy, 2002) is 
another knowledge-based program whose move decision process is described 
in Bouzy (2003). Olga means OLGA(Depth-l, r^=l, CT e =0.2) using PP and 
not the all-moves-as-first heuristic. Oleg uses the all-moves-as-first heuristic, 
a constant temperature corresponding to K=2, and it does not use PP. Table 8 
shows the grid of the all against all tournament. 

First, Monte Carlo excepted, our tests 
show that, on 9x9 board, GnuGo 3.2 is 
about 8.7 points better than Indigo2002. 
Then, considering Monte Carlo, both Olga 
and Oleg are far below GnuGo (more than 
thirty points average). However, given the 
very large difference of complexity between 
the move generator of GnuGo and our move generators, this result is quite 
satisfactory. Against Indigo, both Olga and Oleg perform well. The three 
programs beat themselves circularly. On 9x9 boards, we may say that Oleg 
and Olga containing very little knowledge have a comparable level to the level 
of Indigo that contains a large amount of knowledge. The result between two 
very different architectures, statistical and knowledge, is quite enlightening. 

Besides, we have made tests on larger boards. Although the number of 
games played is not sufficient to obtain significant results, they give an idea 
of the behaviour of Monte-Carlo programs in such situations. On the basis 
of twenty 13x13 games only, Olga is 17 points below Indigo. On a 19x19 
Go board, a 7 games’ confrontation between Oleg and GnuGo was won 
by GnuGo with an average margin of 83 points. Oleg takes a long time to 
play (about 3 hours per game) for several reasons. First, the random games are 
longer. Second, we must play more of them to have an accurate evaluation of 
the moves (we did it with 50,000 random games per move). Lastly, the main 
game itself is longer. In those games, typically Oleg makes a large connected 
group in the centre with just sufficient territory to live and GnuGo gets the 
points on the sides. 

5. Discussion 

While showing a sample game between Oleg and its author, this section 
discusses the strengths and weaknesses of the statistical approach and opens up 
some promising perspectives. 

5.1 Strengths and Weaknesses 

On the programmer’s side, the main strength of the Monte-Carlo approach 
is that it uses very little knowledge. First, a Monte-Carlo game program can 
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be developed very quickly. As Bruegmann (1993) did for the game of Go, this 
upside must be underlined: the programmer has to implement efficiently the 
rules of the game and eyes, and that is all. He can leave all other knowledge 
aside. Second, the decomposition of the whole game into sub-games, a feature 
of knowledge-based programs, is avoided. This decomposition introduces a bias 
in knowledge-based programs, and Monte-Carlo programs do not suffer from 
this downside. Finally, the evaluations are performed on terminated games, 
and, consequently, the evaluation function is trivial. Besides, Monte-Carlo Go 
programs are weak tactically, and they are still slower than classical programs 
and, at the moment, it is difficult to make them play on boards larger than 13x13. 

In the human user’s viewpoint, any Monte-Carlo Go program underestimates 
the positions for both sides. Thus, it likes to keep its own strength. As a result, 
it likes to make strongly connected shapes. Conversely, it looks for weaknesses 
in the opponent position that do not exist. This can be seen in the game of Figure 
4. It was played between Oleg as Black and its author as White. Oleg was set 
with K = 5 and 10,000 random games per move. 'White was playing relatively 
softly in this game and did not try to crush the program. 




Figure 4. OLEG(B)-Helmstetter(W). White wins by 17 points plus the komi. 



5.2 Perspectives 

First, eliminate the tactical weakness of the Monte-Carlo method with a pro- 
cessing containing tactical search. Second, use domain dependent knowledge 
to play pseudo-random games. Third, build statistics not only on the global 
score but on other objects. 

5.2.1 Preprocessing with Tactical Search. The main weakness of the 
Monte-Carlo approach is tactics. Therefore, it is worth adding some tactical 
modules to the program. As a first step it is easy to add a simple tactical 
module which reads ladders. This module can be either a preprocessing module 
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or a post-processing module to the Monte-Carlo method. In this context, each 
module is independent of the other one, and does not use the strength of the other 
one. Another idea would consist in making the two modules interact. When 
the tactical module selects moves for the random games, it would be useful for 
Monte Carlo to use the already available tactical results. This approach would 
require a quick access to the tactical results, and would slow down the random 
games. The validity of the tactical results would depend on the moves already 
played and it would be difficult to build an accurate mechanism to this end. 
Nevertheless, this approach looks promising. 

5.2.2 Using Domain Dependent Pseudo-random Games. Until now, a 
program using random games and very little knowledge has a level comparable 
to Indigo2002. Thus, what would be the level of a program using domain 
dependent pseudo-random games? As suggested by Bruegmann (1993), a first 
experiment would be to make the random program use patterns giving the 
probability of a move advised by the pattern. The pattern database should be 
built a priori and should not introduce too much bias into the random games. 

5.2.3 Exploring the Locality of Go with Statistics. To date, we have 
estimated the value of a move by considering only the final scores of the random 
games where it had been played. Thus, we obtain a global evaluation of the 
move. This is both a strength and a weakness of the method. Indeed, the effect 
of a move is often only local, particularly on 19x19 go boards. We would like 
to know whether and why a move is good. 

It might be possible to link the value of a move to more local subgoals from 
which we could establish statistics. The value of those subgoals could, then, 
be evaluated by linking them to the final score. Interesting subgoals could deal 
with capturing strings or connecting strings together. 

6. Conclusion 

In this paper, we described a Monte-Carlo approach to computer Go. Like 
Bruegmann’s (1993) Monte-Carlo Go, it uses very little domain-dependent 
knowledge, except concerning eyes. When compared to the knowledge-based 
approaches, this approach is very easy to implement. However, its weakness 
lies in the tactics. We have assessed several heuristics by performing experi- 
ments with different versions of our programs Olga and Oleg. Progressive 
pruning and the all-moves-as-first heuristic enables the programs to play more 
quickly without decreasing their level much. Then, adding a constant tem- 
perature to the approach guarantees a higher level but yields a slightly slower 
program. Furthermore, we have shown that adding simulated annealing does 
not help: it makes the program more complicated and slower, and the level is 
not significantly better. Besides, we have tried to enhance our programs with 
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a depth-two tree search, which did not work well. Lastly, we have assessed 
our programs against existing knowledge-based ones, GnuGo and Indigo, 
on 9x9 boards. Olga and Oleg are still clearly inferior to GnuGo (version 
3.2) but they match Indigo. 

We believe that, with the help of the ever-increasing power of computers, this 
approach is promising for computer Go in the future. At least, it provides Go 
programs with a statistical global search, which is less expensive than global 
tree search, and which enriches move generation with a kind of verification. In 
this respect, this approach fills the gap left by global tree search in computer Go 
(no termination) and left by move generation (no verification). We believe that 
the statistical search is an alternative to tree search (Junghanns, 1998) worth 
considering in practice. It has already been considered theoretically within the 
framework of Rivest (1988). In the near future, we plan to enhance our Monte- 
Carlo approach in several ways: adding tactics, inserting domain-dependent 
knowledge into the random games, and exploring the locality of Go with more 
statistics. 
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Abstract Computer-Go programs have high computational costs for static analysis, even 
though most intersections of the board remain unchanged after one move. There- 
fore, we introduced the method of incremental computation as an essential feature 
in Go programming. This paper explores how incremental computation is ap- 
plied to the static analysis in Go programs, and describes two types of analysis 
and pattern recognition. One type is determination in cases where the territories 
of groups are almost determined. This includes (1) the methods of determining 
the life and death of a group by numerical features and (2) the method of finding 
the numbers of regions enclosed by the groups based on Euler’s formula. The 
other type is estimation of groups of stones and territories by analysing the in- 
fluence of stones using an “electric charge model” in cases where the density of 
stones is rather low. In the analysis, operations on sets of intersections are used 
for mathematical descriptions when applying incremental computation as well 
as definitions of the notions on the Go board. 

Keywords: incremental computation, Euler’s formula, life and death, potential distribution, 

electric charge model 

1. Introduction 

The strength of computer-Go programs is generally considered as a begin- 
ners’ level despite all efforts by many researchers. Many Go players in Japan 
estimate the current best Go programs as playing at around 4 or 5 kyu in amateur 
rating, although the Japan Go Association recently certified some Go programs 
as one dan. This is stronger than 5 kyu; the difference is 5 handicap stones. 
The progress in playing strength is considered rather slow compared to that of 
computer Shogi. The latter game is also considered very difficult, but appar- 
ently the Shogi programs are steadily improving. We assume that investigating 
the theoretical and mathematical foundations of the game as well as applying 
the results in practical Go programming are significant for computer Go. 
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It is widely accepted that an efficient static analysis is essential to improve 
the playing strength of computer-Go programs. However, the costs of such an 
analysis are much higher than those of chess and Shogi. The static analysis 
needs to be repeated not only at every move, but also at every step in the search 
tree. 

In this paper, we explore how the incremental computation can be applied to 
static analysis. We discuss two types of static analysis and pattern recognition 
in computer Go: determination and estimation. The first type, determination, 
contains the analysis of cases where the territories of the groups have been 
almost determined. This includes (1) the methods of determining the life and 
death of a group by the numerical features and (2) the method of finding the 
numbers of regions enclosed by the groups based on Euler’s formula. The other 
type, estimation, deals with the estimation of groups of stones and territories 
on the board when the density of stones is rather low by analysing the influence 
of stones using an electric charge model. 

The aim of the static analysis is to obtain the phase of the board, which is 
a collection of overall aspects of the board configuration, such as territories of 
black and white stones, influence of stones, and life and death of the groups. In 
most cases, the change in board configurations is restricted to one intersection 
except for capturing, which seldom occurs. The largest part of the phase usually 
remains unchanged for one move, although there are cases where the phase 
changes vastly by one move. By using incremental computation for obtaining 
the phase of the board, we can restrict the evaluation process to the parts changed 
without repeating the same process for any unchanged part of the configuration. 
Since the game of Go requires high computational costs for the static analysis, 
incremental computation is especially effective for computer Go. 

In most previous publications on static analysis in computer Go, the main 
subject dealt with determining the life and death of groups of stones. Those 
works include: the theoretical study of static life (Benson, 1976); determining 
the life and death of groups by some local features including perimeters of the 
empty regions (Chen and Chen, 1999) and by tactical analysis and eye values 
(Fotland, 2002); and static analysis by position evaluation (Muller, 2002). The 
application of combinatorial game theory to yose problems (Berlekamp and 
Wolfe, 1994) is another theoretical result. Nakamura (2000, 2001) presented 
basic approaches to the life-and-death problem, which included estimating the 
number of eyes based on Euler’s formula for connected planar graphs and 
analysing capturing races by semeai graphs. 

There are few papers that discuss the method of incremental computation in 
computer Go so far. Most Go-playing programs seem to have some mechanism 
for incremental computation. Klinger and Mechner (1996) and Bouzy (1997) 
describe some methods for incremental updating of data in Go programs. These 
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two publications contain elements of the basics of incremental computation 
since they take into account the knowledge maintenance and backtracking. 

Since the early program by Zobrist (1969), most Go programs, including 
Indigo (Bouzy, 1995), Go Intellect (Chen, 1989), Handtalk (Chen, 
2002), Explorer (Muller, 2002), and Jimmy 5.0 (Yan and Hsu, 2001) em- 
ploy mechanisms for evaluating the influence of stones and determining terri- 
tories. An important feature of our electric charge model is the computation of 
the potential distribution which is based on incremental computation. Another 
feature is that some aspects of Go boards can be described in detail by potential 
distributions. 

This paper is organized as follows. In Section 2, we describe operations on 
the set of intersections on the board, which are used for representing features 
of pattern analysis as well as mathematical descriptions of incremental com- 
putation. Section 3 describes methods of recognizing blocks and groups based 
on the set operations, and discusses a method of identifying the life and death 
of a group enclosing a region by the numerical features of the regions defined 
by the set operations. Section 4 shows an improved method of estimating the 
number of regions enclosed by the groups based on Euler’s formula for planar 
graphs. Section 5 outlines another approach of static analysis for recognizing 
groups and finding the influence of stones based on the electric charge model 
and on incremental computation. 

2. Set Operations and Incremental Computation 

In this section, we define several constants and some operations on the sets 
of intersections. We show the relation of the operations with incremental com- 
putation. Our intention is not to use the sets of intersections and the operations 
directly for the analysis, but to define basic notions on Go boards and to use 
incremental computation only for the parts that changed in every move. 

2.1 Operations on Sets of Intersections 

The Board is the set B = { (i, j) \ 1 < t , j < N} of intersections. In the 
standard rule N is 19. A configuration is represented by two disjoint sets B C B 
and W C B of intersections occupied by black and white stones, respectively. 
The intersections in B or W are called black or white stones, respectively. The 
other elements of Board, B — B — W, are empty intersections. An intersection 
(z , j ) is fc? an intersection (zrz, rz) , if and only if | z — m\ + \j— n\ = 1. An 

intersection (/, j) is adjacent to a set S of intersections, if and only if (i, j) ^ S 
and there is ( m , n) € S such that (i, j ) is adjacent to (m, n). 

The board B and the empty set 0 are constants. Another constant is Edge Q 
defined by 

□ = {(*, j)\ i = l,i = N,j = l or j = N}. 
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Figure 1. An example of a configuration and the results of extended operations. 



We have three types of operations: Boolean, shift, and extended operations. 

The Boolean operations include union U, intersection D and set difference 

< — 

— . There are four shift operations. The operation Shift Left A is defined by 

A = {{i — 1J) | (i, j) e A, i> 2}. The value of A is the set of intersections 

which are shifted left from the intersections in A. The intersections on the left 

— ► 

edge in A are eliminated. Other shift operations are: Shift Right A, Shift Down 
A J. and Shift Up A they are defined analogously. 

For a set X, it holds that |X| is the number of elements in S. The following 
extended operations are used for representing features of enclosed regions in 
Subsection 3.3. 



exterior (X) = {( i,j ) | (i,j) is adjacent to X} 
thicken(X) = IU exterior(X) 

#adjacent(X) = \XD X \ + \X D X | | 

Some examples of these operations are shown in Figure 1. We represent 
a configuration (Figure 1(a)) by sets B (Figure 1(b)) and W (Figure 1(c)) of 
black and white stones, respectively. The value of #adjacent(B) is 11, and 
that of #adjacent(W) is 3. 

2.2 Operations and Incremental Computation 

Let Y be any set of stones of the same colour, and A be a set of one stone 
of the same colour, such that Y fl A — 0. Incremental computation of an 
operation Op for Y U A means finding the result Op(Y U A) from the value 
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Op(Y ) and the operations on the neighbour intersections of A. The costs of 
incremental computation are generally lower than those of a full computation, 
since the change caused by adding a stone in A is restricted to the neighbour 
intersections of this stone. The results of incremental computation for the basic 
operations are as follows. 

iu(yui) = (iuF)ui 
in(ru4) = (iny)u(ini) 

X-(YUA) = (X-Y)-A 
(Y U A) — X — (Y - X)(J (A- X) 

Y ui =y U A 

Incremental computation for other shift operations is defined analogously. We 
note that the rightmost terms in the equations represent the changes. We also 
note that A — X = 0, if A c X, and otherwise A — X = A. The results of 
the method for the extended operations are shown below, with |5| being the 
number of elements in a set S. 

thicken(Y U A) = thicken(Y ) U thicken(A ) 
exterioriY U A) = thickeniY) U thicken(A) - (7 U i) 

= ( exterior(Y ) — A) U (exterior(A) — Y) 

#adjacent(Y U A) 

= |(yuA)n(yuA)| + |(yuA)n(yjuAj)| 

= #adjacent(Y) + |y ft exterior(A)\ 

3. Static Analysis Based on Set Operations 

In this section, we define blocks and groups, and discuss a method of deter- 
mining the life and death of a group that depends on the shapes of the enclosed 
region and on the positions of the opponent stones in the region. We imple- 
mented and tested most of the methods in Sections 3 and 4 in Prolog. 

3.1 Blocks and Group 

A connected set of intersections is defined by the following recursive rules. 

1 A set of one intersection is connected. 

2 For any set of T of intersections, if a subset S C T is connected, then 
thicken(S) fl T is connected. 

We represent a board configuration by sets B and W of black and white 
stones. The set E of empty intersections is given by E = B — B — W. A black 
block is a connected set Bx ^ B such that thicken(Bx) D B = B\- White 
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blocks are defined analogously. An empty region is a connected set Ex Q E 
such that thicken(Ex) C\E — Ex- 

A liberty, or dame, of a black (or white) block Bx (Wx) is an empty inter- 
section in the exterior of the block. Hence we have 

liberty(Bx, W) = exterior (Bx) fl E = exterior (Bx) — W. 

We note that every block has non-empty liberties, since any block without the 
liberty is dead and removed from the board. 

The configuration in Figure 1 (a) contains two black blocks, two white blocks 
and three small empty regions. The inner black block has two liberties. The two 
black blocks enclose the region of five white stones and five empty intersections. 

A group is an important notion that is defined to be either a block or a union 
of blocks of the same colour such that the blocks are “dynamically” connected, 
i.e., the opponent cannot cut, or separate, the blocks. Although some groups, 
such as blocks connected by kosumi (diagonal) relations, can be recognized 
by static analysis, precise recognition needs dynamic analysis, as discussed 
in Nakamura (2002), We call the group in this narrow sense the linked group, 
which is a set of stones connected by adjacent-to or kosumi relations. In Section 
5 it is shown that most of the groups in the broad sense are recognized by static 
analysis based on the electric charge model. 

A group is alive, if the opponent player cannot capture it, and dead otherwise. 
Practically, a group is alive, if it has two eyes (i.e., small enclosed regions), it 
can be changed to form two eyes or a seki, or the group side wins the capturing 
race relating this group. There is a case where the life and death depends on a 
ko in the group. 

3.2 Life and Death of Groups Enclosing Regions 

Following Berlekamp and Wolfe (1994) and Chen and Chen (1999), we 
represent the types of enclosed regions related to the life and death of groups 
by pairs {a\(3} of symbols, where a represents the state, if the group side 
moves next, and (3, if the opponent moves next. The symbol of the state is 
either L, O, S, or K. Symbol L denotes that the group is alive in the sense 
that the enclosed region can form two eyes, whereas S denotes that the group 
enclosing the region can form a seki. Although the group is alive in the both 
cases, we distinguish S from L, because the opponent group in the region is 
also alive in the seki. Symbol O denotes that the region cannot be two eyes 
but only one eye. Symbol K denotes that the region can be changed to have 
a ko such that it can have two eyes, if the group side wins the ko, and one eye 
otherwise. Possible combinations in this section are {L\L}, {LIS'}, {L\0}, 
{£|0}, {0\0}, {L\K}, and {K\0}. In some cases, life and death depends on 
the outer liberties of the group as well as the features of the region. Note that 
{L\L} corresponds to 2.0 eyes, {L\0} 1.5 eyes, and {0\0} 1.0 eye in other 
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Feature 



Definition 



\R\ 



size of region R 
perimeter of R 

num. of adjacent- to relations in X 
max. neighbours in X 
num. of opponent stones in R 
max. liberties of one stone in BOR 
total num. of opponent stones in R 
num. of intersections in R on the edge 
outer liberties of the group 



\exterior(R)\ 

#adjacent(X) 
maxjneighbour ( X ) 

\BHR\ 

max -liberties (B D R) 
exterior (R D B) 

\RnQ\ 

\liberty(W)\ — \E C\liberty(W)\ 



Table 1. Features for determining the life and death of a group enclosing a region. 

publications (Chen and Chen, 1999; Fotland, 2002). Since we discuss only 
the states of closed regions enclosed by groups and exclude the case where the 
region contains an opponent group with two eyes, we do not use the symbols 
for the states of half eyes or empty eyes. 

Table 1 shows the list of features used for determining the life and death 
of the groups. In this table, R denotes the region, i.e., the set of intersections 
enclosed by a group, E the set of empty intersections, and B the set of opponent 
stones. We assume that the white group encloses the region in the figures. This 
table contains two features defined as follows. 

max -neighbour (X) = max \X fl exterior({p}) | 
pex 

max -liberties (X ) = max | E fl exterior ({p})\ 

We tested this set of features for various patterns of the enclosed regions, 
and found that the features are effective to identify the types of life and death 
of the regions with the size of five to eight including those in the comers and 
those containing opponent stones. 

Chen and Chen ( 1 999) have shown that the life and death of a group enclosing 
an empty region R can be determined by the features, the perimeters of R, and 
the existence of square ijujl , which is given by ^adjacent (R) — |i?| + 1 > 1 
in our terminology. Another possible set of features for this recognition is |/?|, 
#adjacent(R), and max -neighbour (R). 

Figure 2 shows typical empty regions in the comer with {L\K} or {L|S'}, if 
the groups have a few outer liberties. For example, the bent four in the comer 
(a) is in {L\L}, if the white group has two or more outer liberties, and {L\K} 
otherwise. White can choose {L\K} or {LIS'} in (e), if the outer liberties 
are zero or one. These patterns can be identified by the features, the number 
of intersections on the edge, the perimeters of R, and the number of squares, 
#adjacent(R ) — \R\ + 1. 
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Figure 2. Patterns of groups with {L\K} and/or {L|5} when the white groups have a few 
outer liberties. 
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Figure 3. Regions with prisoners in size 5 enclosed by white groups. 



Figure 3 shows examples of patterns of the enclosed regions containing two 
or three opponent stones (prisoners). We can identify each of the patterns by 
the five features in the table. Note that the groups enclosing the patterns (c) and 
(e) are alive by squashing ( oshitsubushi ), if the group has one or more outer 
liberties, otherwise the group is in seki. 

A characteristic of our method is that the recognition is based on numerical 
features of the regions and groups, to which incremental computation can be ap- 
plied. The method does not use pattern matching as used in many Go programs, 
which we consider inefficient and inappropriate for incremental computation. 

The method shown in this section is only applicable to the groups enclos- 
ing closed regions. To analyse patterns with incompletely closed regions, or 
patterns with half eyes or open eyes, several methods have been proposed such 
as those by eye values and eye regions in Chen and Chen (1999) and Fotland 
(2002) and by position evaluation in Chen (2002) . We are working on extend- 
ing our methodology so that it can be applied to incompletely closed regions or 
loosely connected groups, e.g., those connected by bamboo joints. 
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4. Finding the Number of Enclosed Regions Based on 
Euler’s Formula 

The regions enclosed by groups are important for deciding the life and death 
of the group, since the eyes are small enclosed regions and a group enclosing a 
region can be alive as discussed in the previous section. Nakamura (2000, 2002) 
proposed a method of using a formula to find the number regions enclosed by 
the groups. In this section, we show an improved method of finding the number 
of enclosed regions based on the method of incremental computation. The term 
“group” in this section refers to the linked group. 

For any connected planar graph, the number N of regions enclosed by edges, 
or minimal loops, is given by Euler’s formula N = n — k + 1, where n and k are 
the numbers of edges and vertices, respectively. This formula has been applied 
in computer graphics to find the number of enclosed open regions in digital 
figures, which are represented by bit arrays (Gray, 1971). Euler’s formula is 
also applied in finding “holes” in the game Lines of Action (LoA) (Winands, 
Uiterwijk, and Van den Herik, 2001). 

4.1 Application of Euler’s Formula to Go 

For the application of Euler’s formula to graphs to find the number of enclosed 
regions in Go, we consider each stone in a group as a vertex, and each “link” 
between the stones as an edge. The link is either the “adjacent-to” relation or 
the diagonal relation of two stones in the group. 

#link(G) = #adjacent(G ) + \G fl G[ | + \G D Gi \ 

It is remarked that we assume that every stone in a group is connected to at least 
an other stone in the group by the link. 

The group may contain closed loops of stones composed of three stones and 
three links, e.g., and . To find the number of enclbsed regions (or 
the number of open loops), the number of the closed loops ^closed Joop(G ) 
should be subtracted from the number of the minimal loops. The number of 
regions enclosed by the group G, is given by 

#empty-region(G) = #link(G) — |G'| — #closedJoop(G) + 1. 

A group may contain a closed loop of the form , which contains two 
diagonal links. In this case, only one of the diagonal links is valid, since Euler’s 
formula applies only to planar graphs. For example, the black group in Figure 
4 has 16 links including 8 diagonal links, 12 stones and 4 closed loop. The 
number of enclosed regions is calculated as 16 — 12 — 4 + 1 = 1. When the 
intersection A is occupied by a black stone, the numbers of links, stones and 
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Figure 4. Black groups enclosing one region (a) and two regions (b). 




Figure 5. A group in the comer enclosing one region (a) and two regions (b). 

closed loops increase by 3, 1 and 1, respectively, and the number of enclosed 
regions changes to 19 — 13 — 5 + 1 = 2. 

To apply this method to the groups in the peripherals and corners, we consider 
that there are links between stones on the edge of the board and a special virtual 
stone called earth as shown in Figure 5. To find the number of the virtual links, 
we first assign the set of stones on the edge G fl Q to a variable X . The 
number of virtual links is \X\ 9 and the number of closed loops with the earth is 

\X fl X 1 1 + \XDX |. We say that the group G is earthed , if X / 0. Since the 
group in Figure 5 (a) has 1 1 links including 3 virtual links, 8 stones including 
earth and 3 closed loops, the number of open loops is N = 11 — 8 — 3 + 1 = 1. 
After placing a black stone at A, the number changes to iV = 13 — 9 — 3+1 = 2. 

4.2 Incremental Computation 

For an effective incremental computation of the number of enclosed regions, 
we consider the change in number caused by placing a stone on an empty 
intersection p close to a black group G , i.e., there is a stone q G G such that 
there is a link between p and q. For the intersection p — (i,j) not on the edge, 
let C(p) be the circular sequence of eight neighbour states, 

1 5 'Si-f-ij-H? Si+ij—i) Sij—i) Si—ij—ij Si—ijj Si— lj+ij 

where each state S XjV is empty, or a black or white stone around the intersection 
p. The change in the number of regions caused by placing a black stone at p 
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Figure 6. Typical patterns of neighbours and changes in the numbers of enclosed regions. 

equals E(p) — 1, where E(p) is the number of the consecutive subsequences 
in C(p) satisfying the following conditions. 

1 Each state is either empty or a white (opponent) stone. 

2 Each subsequence contains one or more elements in exterior ({p}). 

The fact that the change equals E(p) — 1 is derived from E (p) = n' — 1 — L', 
where n' is the change in the number of links and V is the change in the number 
of the closed loops caused by adding the stone. Figure 6 shows typical state 
patterns of neighbours and the increments of the number of regions enclosed 
by a black group. The symbol o denotes either an empty or a white stone. Note 
that for the intersection A in Figure 4 (a), the change is one, since E(A) = 2. 
Note also that the intersections p with E(p) > 2 are considered the vital points. 

For the intersection p on the bottom edge, the number E(p) is defined as the 
number of consecutive subsequence in the sequence, 

Si-1, j, 'S'i-lj+l) *5i+l,j+i) Sy+lj. 

The number E(p) is similarly defined for other edges with different directions, 
for the comers, and for the white stones. The change in the number of regions 
is E(p) — 1, if the group is earthed, and left and right neighbour intersections 
are empty. Otherwise, the change in the number of regions is E(p) — 2. Note 
that since a stone is placed on the edge in this case, the group changes in being 
earthed. Figure 7 shows typical patterns of neighbours and their changes in 
the number of regions. The pattern (c) represents that the change is one, if the 
group is earthed, and zero otherwise. Patterns (e) and (f) represent two cases 
in the comer intersection. Since the point A in Figure 5 (a) matches the pattern 
(c) and the group is earthed, the change in the number of regions is one. 

4.3 Problems Related to Incremental Computation 

The method shown in this section only provides the number of enclosed 
regions, but no information on the position of the regions, which are necessary 
for the analysis as performed in Section 3. A practical method for determining 
the position is using the potential distribution to be described in Section 5. 

Moreover, the enclosed regions counted by the method might include false 
eyes. A false eye occurs, when two blocks are connected by two diagonal 
links. Hence, a region with one empty intersection enclosed by two blocks is 
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Figure 7. Typical patterns of neighbours on the edge (a - d) and the comer (e, f). 




Figure 8. Groups consisting of two blocks connected by diagonal links. 

not a false eye, if the blocks are connected by three or more diagonal links and 
virtual links (Figure 8). We can calculate the number of the connecting links 
by subtracting the total number of diagonal links in the two blocks from the 
number of diagonal links and virtual links in the group. Note that this rule is also 
effective for determining the life and death of the dragon with two heads, i.e., a 
group with two blocks connected by two false eyes. Although the condition for 
unconditional life by Benson (1976) covers these groups, our rule is simpler 
and appropriate for incremental computation. 

Although most other enclosed regions can form one or two eyes as discussed 
in Section 3, there is the case that a large enclosed region containing an opponent 
group might form no eye and/or a seki. This problem can be solved by analysing 
capturing races (Nakamura, 2001). 

5. Static Analysis by Electric Charge Model 

This section outlines how incremental computation is used in estimating 
groups and territories based on the electric charge model For a board config- 
uration, the potential of each intersection is defined as follows. 

1 Each stone distributes potential values 1/d to intersections around this 
stone, where d is the Manhattan distance between the stone and the in- 
tersection. The potential of every intersection is the sum of the potential 
values given by all the stones nearby. The potential given by black stones 
and that by white stones are separately calculated. 

2 Stones close to the edges or the comers have their mirror images as shown 
in Figure 9. Therefore, the intersections near the edges or the comers 
have higher potential than intersections in the centre of the board. 
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Figure 9. An example of mirror images. 




Figure 10. Values added to distances in shadows. 

3 A potential value of an intersection given by a stone 1 is reduced, if the 
intersection is in the shadow of another stone. The distance d is increased 
by the value shown in Figure 10 in the calculation of the potential value. 

We use two different types of potential distributions. One is the potential distri- 
bution reflecting only the shadows of the stones of the same colour and the other 
is the potential distribution reflecting the shadows of all stones. In Subsection 
5.3 the latter type of distribution is used for recognizing groups. The first type 
is intended to be used for estimating the strength of the groups, although the 
use is not shown in this paper. 

Figure 1 1 shows examples of potential distributions. Because of the mirror 
images, the intersections in the comer have higher potentials as shown in Figure 
1 1(c). In general, the potential of an intersection represents the degree to what 
extent the intersection is surrounded by stones. The potential in an enclosed 
region is approximately 4 to 6 as shown in Figure 11(b), and independent of 
the size and the shape of the region or of the stones of the opponent. 

5.1 Incremental Computation of Potentials 

Because of the linear, additive nature of the potential, incremental compu- 
tation is generally simple, although mutual interactions of the shadows make 
the computation more complex for the configurations with many stones. We 
employ the following approximation method for computing a potential distri- 
bution. 
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-0. 2-0. 3-0. 5-1. 0tt).0)-l. 0-0. 5-0. 3-0.2- 

i i i i V i i i i 

-0.2-0.2-0.3-0.5-1.00.5-0.3-0.2-0.2- 
I I I I I I I I I 

-0.2-0.2-0.2-0.3-0.5-0.3-0.2-0.2-0.2- 
I I I I I I I I I 

-0.1-0.2-0.2-0.2-0.3-0.2-0.2-0.2-0.1- 

i i i i i i i i i 



i i i i i i i i i i 

-0.5-0.8-0.01.2-1.3-1.3-1.2-0.9-0.8-0.5- 
I I I I I I I I I I 

-0.00.7-1.1-1.5-2.1-2.1-1.5-1.1-0.7-0.6- 

I I I I I I I I 

-0. 7-0. 8-1. 2-2.6^. 5)(2.5)-2. 01.2-0. 8-0.7- 
I I I AYYA I I I 

-0.7-0.01. 5u.5^.5-4.5Q. 5)-1.5^0.9-0. 7- 

I I I tAAY I I I 

0 . 70 . 8-1 . 2-2 . 6f2 . 5^2 . 5^-2 . Ol . 20 . 80 . 7- 
i i i i W i i i i 

0.60.7-1.1-1.5-2.1-2.1-1.5-1.10.70.0 
I I I I I I I I I I 

0.50.80.01.2-1.3-1.3-1.20.00.80.5- 

i i i i i i i i i i 



(a) One stone in the center. 



(b) An enclosed region in the center. 



0.20.30.50.70.01.1-1.2-1.4-1.7-1.4-1.2 
I I I I I I I I I I I 

0.10.40.40.70.01.3-1.2-1.4-2.01.4-1.2 
I I I I I I I I I I 

0. 30. 30. 50. 01.01. 8-1 .4-1. 80 -OH -8-1. 4 
I I I I I I 1 x I I 

0.20.30.40.7-1 .30.01 .7- 1*4-1 .7-1 .2-1 . 1 
i i i i i V i i i i i 

0.00.20.30.40.7-1.3-1.10.01.00.80.8 
I I I I I I I I I I I 

0.10.20.20.30.50.80.50.70.80.70.5 
I I I I I I I I I I I 

0.10.10.20.20.40.50.40.40.60.50.5 

i i i i i i i i i i i 



(c) Two stones in the comer. 



Figure 11. Examples of potential distribution. 



1 We restrict the area to which the potential values from a stone are dis- 
tributed to the set D of intersections in such a way that the distance from 
the stone to the other intersections is fewer than 8. The stones have their 
mirror images, only if the distance between the stones and the edge is 
fewer than 5. 

2 Whenever a stone is placed on an intersection: 

(a) the potential of every intersection in the area D is increased by 
the value 1/d , where d is the distance between the stone and the 
intersection; and 

(b) for each stone in the area D , the decrements by the effect of shad- 
ows are subtracted from the potentials of intersections in the two 
symmetric shadows of the two stones. 

Note that the potential at an intersection obtained by the computation method 
above is slightly different from the one given by the definition in the previous 
subsection, if the intersection is in double shadows. The errors by the approxi- 
mation, however, are negligible. 
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Figure 12. Computation time for potential distributions by computing the difference . 

Figure 12 shows a graph of computation time of the potential distributions 
for each move. For the experiment, we used a Pentium III processor with 1G 
Hz clock running a program written in C++. The computation results include 
four potential distributions, i.e., those for two types of distributions for both 
black and white stones. Although the time increases with the number of stones 
while the number is fewer than approximately 50, the time is almost constant 
otherwise. 

5.2 Recognition of Groups and Territories by Potential 
Distributions 

Let B(i,j) be the potential of an intersection (i,j) given by black stones, and 
W (/, j ) by white stones. We use the potential distribution that has the effects of 
shadows by both black and white stones. The procedure for recognizing black 
groups is as follows. 

1 First, select group points from a given configuration by the following 
rules. 

(a) The intersection occupied by a black stone is a black group point. 

(b) An empty intersection is a black group point, if B(i,j ) > v\ and 
B(i,j) — W{i, j) > V 2 , where v\ and v<i are parameters. 

The white group points are selected similarly. Based on many experi- 
ments, we determined the parameters as v\ = 1.0, V 2 = 0.55. 

2 Determine connected sets of group points of the same colour (Note that 
this process is similar to that for determining blocks). The set of stones in 
each of the connected sets is a group. The connected set of group points 
represents the influence range, or the territory, of the group. 
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Figure 13. An example of groups and their territories derived from potential distributions. 

3 In some cases, groups connected by diagonal, or kosumi, relations are 
not recognized only by the rules above. Hence, it is necessary to unify 
these groups into one by finding stones with this relation. 

By testing this method for various configurations in games by professional 
players, we found that incremental computation can correctly recognize most 
groups for the wide range of configurations with more than 20 stones. Figure 
13 shows an example of groups in a game (Black: C. Chou and White: M. 
Takemiya, 1994) evaluated by incremental computation. The dark gray areas 
represent the white territories, and light gray areas the black territories. 

It is generally not difficult to compute the difference in finding the groups, 
since a group usually expands in each move and the changes of the group points 
are restricted to intersections near to the point of the move. There is, however, 
the case that a group is cut and separated by erroneous move(s) or ko threats. 
This case will be investigated in future research. 

5.3 Comparison with Other Approaches 

Most Go programs employ some methods of evaluating influence of stones 
and/or finding territories, including the potential distribution (Zobrist, 1969), the 
5/21 algorithm (Bouzy, 1995), and the heuristic territory evaluation by Muller 
(2002). A feature of our approach is that each stone distributes the potential 
of 1/d to the neighbour intersections for the distance d. This methodology is 
common to those in several Go programs including Handtalk (Chen, 2002), 
Go-Intellect (Chen, 1989), and Jimmy 5.0 (Yan and Hsu, 2001) in the 
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sense that each stone distributes, or radiates, some inference values to neigh- 
bour intersections. In contrast, these programs employ different methods for 
calculating the values except Handtalk, in which the distribution of values 
is similar to our distribution led by 1 /d. For example, Go-Intellect uses 
the exponential function (exp(— d)) instead of 1 /d. 

A unique feature of our method in addition to incremental computation is 
that the influences of black stones and white stones are separately calculated. 
This is different from most other programs, in which the influence values by 
black (or white) stones are subtracted from those by white (black) stones to 
form a single distribution of influence values. 

Another unique feature of our method is that it uses the mirror images, the 
shadows, and four kinds of potential distributions to describe some aspects of 
Go boards in detail. By these features, the potential values of every intersection 
in the board represent how strong other black and white stones surround the 
intersection. Note that this property is based on the potential given by 1 /d and 
the mirror images. 

6. Concluding Remarks 

In this paper, we discussed static analysis based on incremental computa- 
tion to be used in the static analysis in Go programming. The main questions 
were: (1) how the incremental computation can be applied to the static analysis, 
(2) how much does the computation speed increase by incremental computa- 
tion, and (3) what sort of analysis is suitable or unsuitable for incremental 
computation? We showed applications of our method to static analysis in Go 
programming, including: 

■ identifying the life and death of a group enclosing a region by numerical 
features, which are described by the operations on sets of intersections; 

■ finding the number of regions enclosed by a group based on Euler’s 
Formula; and 

■ estimation of groups and territories by potential distributions based on 
the electric charge model. 

The analysis methods are based on numerical features or values, and not on 
pattern matching. Most notions in the static analysis and incremental com- 
putation are mathematically defined by the operations on sets of intersections. 
We showed that incremental computation can be used for the operations in the 
analysis. 

The author and his colleagues are implementing the methods described above 
in a Go-playing program in Prolog and C++. There is still some work to be 
done before we can satisfactorily answer the questions above. Future problems 
include: 

■ finding numerical features effective for identifying alive-and-dead pat- 
terns of loosely connected groups, especially those in the comers; 
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■ developing a method of acquiring a broad class of alive-and-dead patterns 
and making a database efficiently; and 

■ developing faster algorithms for the analysis, especially for incremental 
computation of the potential distributions. 
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Abstract In 1993, the Chinook team completed the computation of the 2 through 8- 
piece checkers endgame databases, consisting of roughly 444 billion positions. 
Until recently, nobody had attempted to extend this work. In November 2001, 
we began an effort to compute the 9- and 10-piece databases. By June 2003, 
the entire 9-piece database and the 5-piece versus 5-piece portion of the 10-piece 
database were completed. The result is a 13 trillion position database, compressed 
into 148 GB of data organized for real-time decompression. This represents the 
largest endgame database initiative yet attempted. The results obtained from these 
computations are being used to aid an attempt to weakly solve the game. This 
paper describes our experiences working on building large endgame databases. 

Keywords: Retrograde analysis, endgame databases, checkers 

1. Introduction 

Endgame databases have had an enormous impact in computer games re- 
search. They have been instrumental in building world championship pro- 
grams (e.g., the World Man-Machine Checkers Champion Chinook (Schaef- 
fer, 1997)), solving games (e.g., Nine Men’s Morris (Gasser, 1996) and Awari 
(Romein and Bal, 2002, 2003)), and uncovering new insights into games. 

For converging games, where the number of pieces on the board reduces 
as the game progresses, larger endgame databases are a performance asset 
to a game-playing program, both in terms of reducing the size of the search 
tree and by replacing heuristic evaluations with perfect knowledge. However, 
there are practical considerations to building large databases, including the time 
required to compute them, and the resulting size of the (compressed) databases. 
Few researchers and developers have the expertise, motivation, patience, and 
computing resources to push database technology to its limit (a recent exception 
is the solution to the game of Awari (Romein and Bal, 2002, 2003)). This 
means, for example, that the 6-piece chess endgame databases are unlikely to 
be completed in the near future. 
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C hinook is the World Man-Machine Checkers Champion (Schaeffer, 1 997) . 1 
The 8-piece endgame databases were a critical part of the program’s success 
against the top human players. The databases contained secrets that were well 
beyond the understanding of even the premier players in the world. These 
databases were started in 1989 and completed in 1993 — 444 billion positions 
compressed into 5.6 GB of data. These numbers may seem small by today’s 
standards, but were impressive back in the early 1990s when a state-of-the-art 
CPU was an Intel 486, 32 MB was considered to be a lot of memory, and 1 GB 
disks were new technology and very expensive. 

Beginning in November 2001, we started production runs for computing the 
9- and 10-piece checkers endgame databases. The databases are not needed 
to improve the playing strength of checkers programs; there are currently at 
least five checkers programs that are superior to all human players. Rather, 
there is a more enticing goal: solving the game of checkers (or, more precisely, 
weakly solving the game (Allis, 1994)). The total search space for the game is 
5 x 10 20 , a seemingly prohibitively large number. However, most of the search 
space is likely to be irrelevant to the proof, and resulting estimates of the proof- 
tree size are well within what is possible to compute with current technology. 
Building the 10-piece databases (specifically the key 5-piece versus 5-piece 
subset, where each side has the same number of pieces) is a kdy stepping stone 
to solving checkers. 

This paper describes our experiences building the 9- and 10-piece checkers 
databases. The task was daunting, given the need for 64-bit addressing, large 
computations (up to 171 billion positions at a time), large intermediate disk 
needs (over 1 TB), verification of the results, and fault tolerance. In 10 years, 
these numbers will seem trivial, but the techniques will be useful for the next 
large database computation. 

This paper makes the following contributions: 

1 the practical considerations that complicate any long-term data-intensive 
computation, 

2 the system issues that need to be addressed, including memory con- 
straints, concurrency, compression, and fault tolerance, 

3 improved data compression techniques, 

4 data on the 9- and 10-piece checkers databases, and 

5 speculation on the likelihood of solving checkers in the near future. 

Section 2 describes the algorithms used to compute the 8-piece databases. 
Section 3 discusses the enhancements needed to move to the larger 10-piece 



'There are over 100 checkers variants. The variant used here is played on an 8 x 8 board and is popular in 
the former British Commonwealth and in North America. So-called International Checkers is played on a 
10x10 board and is popular in Russia, Europe, and Africa. 
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databases. The results from building the databases and the implications for 
solving the game of checkers are in Section 4. Section 5 concludes with per- 
spectives on building larger databases. 

2. Algorithms 

The important application-specific properties that influence the database al- 
gorithms are (Goldenberg et al., 2003) (the “Properties' 1 ): 

1 The game starts with 12 white and 12 black checkers on the board. 

2 A captured piece is removed from the board and cannot return. 

3 Checkers can be promoted to become kings (when the checker moves to 
the back rank of the opponent). 

4 Checkers move forward; kings move forward and backward. 

The algorithms used for the checkers computation are updated versions of 
those used to compute the Chinook 8-piece databases (Lake et al., 1994). 
This code had not been touched since the completion of the databases in 1993. 

The most common format of an endgame database stores for each position a 
distance metric. This metric is typically either the number of moves to win (if 
appropriate) or the number of moves to convert to another database. This level of 
detail is tremendously useful in practice since it allows a game-playing program 
to play the “best" database moves without needing any search. However, this 
representation requires (at least) a byte of data per position, and the resulting 
database does not compress well. The philosophy adopted for building checkers 
databases has been to build the largest databases possible. To do this necessitates 
storing the minimal amount of information per position in the database — 
recording only whether a position is a win, a loss or a draw. The result facilitates 
the creation of large endgame databases that compress extremely well. 

For database calculations, each position is represented by 2 bits, representing 
the values win (W), loss (L), at least a draw (D), and unknown (U). Using D 
to mean at-least-a-draw instead of exactly a draw is useful, since it reduces 
the amount of disk I/O done by the program (see the Lookups phase described 
below). A portion of the endgame database (a slice) is computed by resolving 
all positions as wins, losses or draws. The final result is compressed, verified, 
and then added to the master copy of the completed databases. 

The 10-piece databases are huge (8.5 trillion positions for just the 5-piece 
versus 5-piece subset), and it is not practical to do the entire calculation as one 
big computation. Instead, the problem is broken down into smaller slices that 
can be solved more easily. The databases are broken down as follows: 

■ By pieces: The N-piece database can be computed once the N-l-piece 
database is done (by Property #2). 

■ By material: An N-piece database is further divided so that subsets with a 
different number of pieces per side can be computed in parallel (Property 




196 



J. Schaeffer, Y. Bjornsson, N. Burch, R. Lake, R Lu, S. Sutphen 



#2). For example, in the 9-piece database computation, the 8 pieces 
versus 1, 7 versus 2, 6 versus 3, and 5 versus 4 subsets can be computed 
in parallel. 

■ By number of kings: The material division is further broken down by 
the number of kings for each side (exploiting Property #3). For example, 
after 5 kings versus 4 kings have been computed, then the subset 4 kings 
and 1 checker versus 4 kings can be computed (the one checker might 
promote, thus the 5 king versus 4 king database must be computed first). 

■ By leading rank: A sub-database is further sliced into pieces by consid- 
ering the position of each side’s most advanced (leading) checker (from 
ranks 1 to 7). Positions where the leading checker is on rank R must 
be computed before those where the leading checker is on rank R — 1 
(Property #4). For example, in the 4 kings and 1 checker versus 4 kings 
endgame, all positions where the checker is on the seventh rank must be 
computed before tackling all positions where the checker is on the sixth 
rank. For databases where each side has a checker, this technique results 
in dividing the computation into 49 (not-necessarily-equal) slices, dra- 
matically reducing the size of the biggest computation to be performed. 

More details on the decomposition can be found in Lake et al. (1994). 

Table 1 shows how the 5-piece 
versus 5-piece subset of the 10- 
piece database can be subdivided into 
smaller pieces. The first column gives 
the number of kings and checkers for 
the sub-database using the notation 
“bk wk be wc", where bk is the num- 
ber of black kings, wk is the num- 
ber of white kings, be is the number 
of black checkers and wc is the num- 
ber of white checkers! The 8.5 tril- 
lion positions are divided into 21 sub- 
sets based on the number of kings and 
checkers. The 3223 subset (3 kings 
and 2 checkers for black; 2 kings and 3 
checkers for white) is the largest, with 
roughly 1.6 trillion positions. This is 
subdivided into 49 slices based on the 
leading checker. 

The largest slices in the 5 piece 
versus 5 piece subset of the 10-piece 
database are shown in Table 2. To 
Table 1. Database slices for 10-piece database specify a slice, we use the notation 
(5 versus 5 pieces). 



| Database 


Total Positions 


Slices 


5500 


16,257,084,480 


1 


5401 


142,249,489,200 


7 


5302 


247,789,432,800 


7 


5203 


214,750,841,760 


7 


5104 


92,565,018,000 


7 


5005 


15,868,288,800 


6 


4411 


311,375,610,000 


28 


4312 


1,085,553,705,600 


49 


4213 


941,518,468,800 


49 


4114 


406,152,630,000 


49 


4015 


69,686,136,000 


42 


3322 


946,853,107,200 


28 


3223 


1,643,753,217,600 


49 


3124 


709,688,460,000 


49 


3025 


121,877,184,000 


42 


2233 


714,003,388,800 


28 


2134 


617,101,500,000 


49 


2035 


106,080,312,960 


42 


1144 


133,467,390,552 


28 


1045 


45,934,129,104 


42 


0055 


3,956,576,472 


21 


Total 


8,586,481,972,128 


630 
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Slice 


Size 


3223.77/2332.77 


85,515,674,400 


x2 = 


171,031,348,800 


2233.76/2233.67 


73,228,209,600 


x2 = 


146,456,419,200 


3223.67/2332.76 


71,823,866,400 


x2 = 


143,647,732,800 


3223.76/2332.67 


59,656,240,800 


x2 = 


119,312,481,600 


3322.76/3322.67 


58,741,300,800 


x2 = 


117,482,601,600 


3223.57/2332.75 


58,132,058,400 


x2 = 


116,264,116,800 


2134.77/1243.77 


56,491,266,000 


x2 = 


112,982,532,000 


2233.77 


104,558,625,600 


= 


104,558,625,600 


3223.66/2332.66 


50,304,477,600 


x2 = 


100,608,955,200 



Table 2. Largest 10-piece database slices. 



“bk wk be wc . br wr" where br is the rank of the leading black checker and 
wr is the rank of the leading white checker. The largest slice is 171 billion 
positions (3223.77 with black to move and its mirror database 2332.77 with 
white to move). Using 2 bits per position, this slice requires almost 40 GB of 
storage during its computation phase. In total, there are only 9 slices that have 
a size of over 100 billion positions. 

Note that slices can be further sub-divided. Gil Dodgen and Ed Trice (2002) 
have experimented with using both the rank of the leading checker and the 
configuration of checkers on the first rank to achieve further 1 subdivisions. The 
finer granularity of the slices reduces the RAM needs and increases the compu- 
tation’s concurrency. For the work reported here, additional subdivisions were 
not needed. However, with current technology they might be needed if one 
wanted to compute the 1 1 -piece databases (currently not in our plans). 

The endgame database solving programs were designed with the following 
objectives in mind: reduce the amount of disk I/O needed, reduce the memory 
requirements for the largest jobs, and use as many machines as possible. The 
computation of a database slice consists of 5 phases. The phases iterate over 
the data, where each position value in the slice has been initialized to unknown 
(U). The database construction phases are summarized in Table 3. 

1 Captures: The rules of checkers require that a capture move, if present 
in a position, must be played. A capture move removes one or more 
pieces from the board. All capture moves are looked up in previously 
computed databases and the maximum of the resulting values (W/L/D) 
is assigned to the position. For an N-piece database calculation, this 
phase only requires the 2 through N-l -piece databases. This is important 
because the N-l -piece databases are considerably smaller than the N- 
piece databases. For example, the 9-piece databases are only 18 GB in 
size. Thus the capture phase for all 10-piece database calculations can 
be computed well in advance of when the data is needed. 

2 Lookups: The databases are sliced according to the leading checker. 
When the leading checker advances, it will result in a position that has 
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already been computed. The Lookups phase resolves all moves by the 
leading checker. By handling this I/O in a separate phase, we can guar- 
antee that the next phase (non-captures) does not have to access any 
previously computed database results. 

The advance of the checker may result in the current position being re- 
solved as a win. In rare cases the only moves possible in a position are 
those of the leading checker. If all these moves lead to losing positions, 
then the current position can be resolved as a loss. If the leading checker 
advances and the resulting position leads to a draw, then we have a lower 
bound on the value of the position. The position might still be a win 
(a king move or non-leading checker move could lead to a winning po- 
sition). Thus, if a leading checker move results in a draw score, this 
position is marked as a D but with the semantics being that the value is 
> a draw. For this phase, only the N-piece database is needed (but, as 
explained below, because of the compression scheme used, the 2 through 
N- 1-piece databases might also be required). 

3 Non-captures: The preceding phases resolved all requests for infor- 
mation from previously computed database slices. In the non-captures 
phase, only moves by kings and non-leading checker^ are considered. 
Hence there is no need to access the previously-computed databases. 
In contrast to the previous phases, the non-captures phase is compute- 
intensive. 

This phase iterates over all positions in the slice, skipping over capture 
positions (their values are fixed) and W/L positions (their value cannot 
change). Only unresolved positions and draw positions are considered; 
the former to discover whether the position is a W/L/D and the the latter 
to see if the D can become a W. This phase only resolves wins and losses. 
When no more changes occur during an iteration, the non 7 captures phase 
is complete. Any position that has a U or D value must be a real draw. 

This phase may require iterating over the data 100 or more times (the 
maximum number of ply needed to force a winning position into another 
database slice). To reduce the cost, the program iterates over all positions 
until a “small" number of changes occurs in an iteration. The positions 
that change value are saved in a queue. For subsequent iterations, the only 
positions whose value can be resolved are those that are a predecessor of 
a queue position. 

4 Compression: The endgame databases are needed in a real-time search- 
ing program (such as Chinook). Hence the data has to be compressed in 
a way that supports real-time decompression. The compression scheme 
used is described in Section 3.3. 
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Name & 
Needs 


Description 


Databases 

Used 


Values 

Set 


Time 

(%) 


Captures 

I/O 


Resolve capture moves. 
Sequential pass over the data. 


2 — (N-l) 


W,L,D 


15 


Lookups 

I/O 


Resolve non-capture moves 
that result in database positions. 
Sequential pass over the data. 


2— N 


W,L,>D 


24 


Non- 

Captures 

CPU 


Resolve non-capture moves. 
Repeated passes over the data, 
both sequential and random 
access, until no more changes. 


None 


W,L 


20 


Compress 

I/O 


Convert to final compressed 
format. 

Sequential pass over the data. 


None 


D 


1 


Verify 

I/O 


Verify that the new results 
are consistent internally and 
with pre-existing databases. 
Sequential pass over the data. 


2— N 


None ' 


40 



Table 3. Database construction summary. 

5 Verification: Errors are a fact of life in any long-running computation. 
Since one result depends on another, it is critical that the computations 
be verified for correctness. There is an easy way to do this: after the non- 
captures phase, a quick scan of the data can verify if the resulting set of 
values is internally consistent (self-consistency). This is quick, but does 
not catch all possible errors. Instead, our verification phase operates not 
on the 2-bit-per-position representation but on the compressed database. 
All positions are verified that they are consistent not only within the 
slice, but also with respect to previously computed data. The latter point 
dramatically increases the cost of the verification, but can find errors not 
caught by the fast scheme. Besides, it makes it easier to sleep at night! 

The database construction phases are summarized in Table 3. The time 
column is a generic average that represents the percentage of wall clock time 
spent in each phase. These numbers can vary significantly depending on the 
data set used. The verification phase is the most expensive since, in effect, it 
has to repeat most of the work done in the previous phases. 

The breakdown of the computation into multiple phases assists in planning 
how to effectively acquire and use computing resources. The captures, lookups, 
and verification phases are I/O bound. These phases need to be run on machines 
with a minimum of 300 GB of disk storage, and they benefit from the fastest 
possible disk drives. The non-capture phase is compute bound and should be 
run on the fastest available processor. This phase is easily parallelized, and the 
performance scales well to a large number of processors on a shared-memory 
computer. 
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3. Moving from Eight to Ten Pieces 

This section discusses the issues that had to be addressed to enhance the 
Chinook database calculations to accommodate the larger size of the 10-piece 
databases. 

3.1 64-bit Indices 

By subdividing the databases into slices, the original Chinook code could 
get by with using 32-bit numbers for position indices. For the 10-piece databases, 
the largest individual slice was 104 billion positions (the symmetric database 
2233.77), necessitating at least 37 bits for addressing. 

The Chinook code was converted to use 64-bit indices. By-and-large this 
was easy to do, but there were some subtleties that were initially overlooked. 
For example, most C compilers do automatic conversion between 32- and 64- 
bit numbers (both ways), possibly losing precision (and usually not getting a 
compiler warning). Another danger was intermediate expression results. Some 
expressions combined 32- and 64-bit data with implicit data conversions that 
could lead to errors. 

Note that simply converting all numbers to use 64 bits was not an option. The 
tables used for computing position indices occupy a lot of memory. Using 32- 
bit numbers wherever possible reduced the memory footprint of the program, 
freeing up more space for disk caching. 

3.2 64-bit File Sizes 

When we started the project, support for 64-bit file sizes was not fully inte- 
grated in Linux. However, we were fortunate in that the experimental kernels 
we used fully supported the two routines that we needed: open64 and lseek64. 
Support for large files has limited other groups wanting to build large databases 
on Windows’ platforms. 

3.3 Compression 

Many endgame databases associate a distance metric with a database position 
(the number of moves to win or the number of moves to convert to another 
database slice). For checkers, this was impractical. Our goal was to build the 
largest database possible. For this to happen, disk space and the execution 
overhead of accessing the data could not be a limitation. For example, if a 
byte was associated with each of the 13 trillion database positions computed, 
then 13 TB of disk would be needed. Even a generous 10:1 compression ratio 
would still leave the database size at an awkward 1.3 TB. The large disk size 
will dramatically slow down database computations since it will be difficult 
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to achieve spatial and temporal disk locality (this was elegantly addressed for 
smaller databases by Lincke and Marzetta (2000)). 

Allowing only win-loss-draw values in the database enables 5 position values 
to be encoded in a byte (3 5 = 243 < 256). Using this trivial compression would 
result in 13 trillion positions being encoded into 2.6 TB. This is still too large 
(and expensive) to be practical. Further data compression is needed. 

The data has to be available for use in a real-time search. Hence any com- 
pression scheme has to support rapid real-time decompression. The databases 
were compressed by using two techniques: removing information that can be 
easily re-computed, and run-length encoding. 

Any position where either side to move could result in a capture would have 
the position result removed from the database (i.e., capture and threatened cap- 
ture positions). It is easy to re-compute the value of a capture position: play 
the capture move(s) and look up the resulting position(s) in the database. Re- 
moving values for positions where a capture is threatened is more problematic. 
To re-compute this value, the side to move must try all possible moves and, in 
some cases, in the resulting position the opponent has a forced capture or there 
is a threatened capture — all these positions must be looked up in the database. 
Hence positions with a threatened capture may require an expensive search to 
resolve. It quickly became clear that with our compression algorithms, sim- 
ply removing capture position values was not good enough; we had to remove 
threatened capture positions to make the compressed database size reasonable. 
Our estimate is that removing threatened capture positions improves the com- 
pression by a factor of 4. 

All capture and threatened capture positions had their value replaced by the 
dominant value in the database slice. Then run-length encoding would be used 
to compress the data. The original Chinook algorithm encoded 5 positions 
into a byte (Lake et al., 1994). That left 13 values for the run-length encoding 
(256 — 3 5 = 13). These values were used to represent runs of the dominant 
value, for runs of length 10 to 3,200. For example, a database slice might be 
dominated by wins. The capture and threatened capture positions (typically 
75% of the positions) would have their values replaced by a win. Run-length 
encoding would find many long stretches of wins and encode them into one (or 
a few) bytes. 

The original Chinook databases, 444 billion positions (all the 2 through 
8-piece databases), were compressed into 5.6 GB. This works out to an average 
of roughly 77 positions encoded in a byte. This is misleading since the lop- 
sided databases (e.g., 6 pieces versus 2) compress very well (they are almost 
all wins for the strong side), whereas the even material databases (e.g., 4 pieces 
versus 4 pieces) have a mix of win, loss and draw values, resulting in poorer 
(but still good compression). The 4 pieces versus 4 pieces database averaged 
22 positions per byte. 
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For the 10-piece databases, our initial estimates were that the above scheme 
would result in a final database size of 400 GB. Thus it was important to find 
a better compression scheme. The new algorithm is based on Huffman coding 
and consists of the following steps: 

1 Replace capture and threatened capture positions with the W/L/D value 
that continues the current run. 

2 Convert the above into a string of (W/L/D, run -length) pairs. There will 
not be two consecutive runs with the same first value. 

3 Predict the value of a run based on the value of the run before the previous 
run. For example, given runs (draw, X) and (loss, Y) we would predict 
the value of the next run to be draw. The prediction is correct roughly 
95% of the time. Now convert the string so that a (value, length ) pair 
simply becomes length , preceded by a special miss symbol if the value 
is not correctly predicted. 

4 If a maximum run length of N is chosen, we then have N — 1 length 
symbols, one escape symbol that states that an integer length follows, and 
one symbol that states that the value of this run is predicted incorrectly. 
Given the frequencies of these symbols, an optimal length limited prefix 
free code (length limited Huffman code (Turpin and Moffat, 1995)) can 
be generated. We use a fixed code generated from the largest database file 
(a separate code per database file does not improve compression much). 
Twenty bits was chosen as a reasonable limitation on the length of the 
bit strings, as a table 1,048,576 entries wide used for decoding seemed 
reasonable and larger string lengths provided minimal improvements. 
Given this maximum, empirical testing on the databases showed a num- 
ber around 10,000 to be the best choice for the maximum run length 
allowed before escaping to a 32-bit integer description. Increasing the 
number of symbols overly crowded the space of bit strings available for 
compression by too much, and decreasing the maximum run length in- 
creased the number of escaped symbols by too much. 

5 The previous types used to predict the types of the first two runs are 
set by looking ahead at these two symbols and using the values that 
will correctly predict them. These values are stored at the front of the 
compressed bit-string using three bits. 

With the new scheme the complete 2-piece through 8-piece databases reduce 
in size from 5.6 to 2.7 GB, cutting the database in half (averaging out to 155 
positions per byte). The complete 9-piece databases is 16.8 GB, an average 
of 227 positions per byte. The 10-piece databases (5 pieces versus 5 pieces) 
compress to 125 GB, 65 positions per byte. This represents a substantial im- 
provement over the 22 positions per byte seen for the 4 pieces versus 4 pieces 
subset of the 8-piece databases. 
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Table 3 shows that the wall clock time is dominated by the I/O-intensive 
phases. The captures, lookups, and verify phases all sequentially proceed 
through the data. However, each may result in a (usually small) search to 
resolve the value of the position by looking up values in previously computed 
databases. This search is a consequence of the data compression scheme used 
(which removes the value for any capture and threatened capture position). The 
alternative was to keep the uncompressed data on disk and use that instead. This 
was not done because of the possibility of introducing an error; the values based 
on I/O operations (e.g., capture positions) have not been verified for correct- 
ness. Rather than trust unverified data, we preferred the (slower) use of the 
compressed data. 

The capture phase runs quite quickly. Surprisingly, typically over 60% of 
the positions get resolved in this phase. Each position has slightly more than 
one legal capture move per position. The remaining positions need to have a 
lookup performed. These positions average roughly 3 moves by the leading 
checker(s), each of which has to be looked up. Each of these searches is, on 
average, considerably more expensive than a simple capture position. Thus, 
even though the lookup resolves only typically 10-15% of database, it runs 
slower than the captures phase because of the increased amount of I/O. 

Each position has I/O performed on it a maximum of two times. Capture 
positions are visited only in the captures phase; they are not included in the 
final compressed database, so no verification has to be done. All the remaining 
positions may have to have I/O done twice: once to do a lookup of any leading 
checker moves, and once to verify the position value if there is no threatened 
capture. 

The databases have been organized to increase data locality. Database slices 
that are likely to lead into one another are located physically close to each other 
in a database file. As well, the program maintains its own internal disk paging, 
allowing the program to prioritize the database pages kept in memory. The 
result is that the program, using 200 MB of page buffers, ends up doing one 
disk I/O for an average of 500 database position value requests. In other words, 
the hit rate is 499/500. 

I/O could be significantly reduced if the database construction program used 
slices selectively. Some of the databases are relatively small, and slicing them 
into 49 pieces incurs a lot of unnecessary overhead. These databases could be 
constructed as one big computation. For example, the 1045 database has only 
45 billion positions — using roughly 10.5 GB. Rather than slicing this piece into 
42 slices — each with a lookups phase — the entire database could be done as a 
single computation. Then the lookups would only be required for part of the 
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database — where there was a leading checker on the 7th rank. This has not 
been done. 

It may seem that the non-captures phase should require the most computa- 
tional effort, given that this phase must make repeated passes over the data. 
Further, some of databases are too large to be resident in RAM, requiring costly 
disk paging. Fortunately, this was not a problem in our implementation. The 
non-captures phase was set up so that references to values in other databases 
(requiring I/O operations) were not needed. The position indexing scheme was 
organized to facilitate spatial and temporal locality. This allowed a (relatively) 
small working set of data to be resident in memory during the non-captures 
phase. This was facilitated by having an internal paging mechanism, allowing 
the program to take advantage of application-dependent properties to minimize 
the I/O. On our machines, 200 MB of RAM was allocated for pages. With this, 
we have been able to complete the non-captures phase on files as large as 25 
GB in only a few days. 

It is interesting to note that the profile of the database computation has 
changed significantly since we did this work in the early 1990s. Some parts of 
the program that were previously I/O bound are now CPU bound (more memory 
to eliminate costly I/O), while other parts that were CPU bound are now I/O 
bound (CPU speed has improved more than disk speed). This meant that we 
had to re-profile the program and use additional optimization techniques. 

3.5 Errors 

Given that this computation takes many CPU years to run and terabytes of 
data transferred from and to disk, it is critical that an error not be allowed to 
creep into the calculation. An error early on in the computation, for example, 
may result in the entire calculation having to be repeated. For example, in 
October 2001, Gil Dodgen and Ed Trice calculated the 8-piece databases. We 
compared the Chinook results with theirs and discovered a difference in the 
7-piece results (Dodgen and Trice, 2002). It eventually turned out that the 
Chinook databases were wrong (a few thousand positions). However, even 
with the error the databases still passed all our verification tests ! This may 
seem strange, but it can happen. The computed data can be internally consistent, 
but wrong. The best way to verify the correctness of the databases is to have 
them independently computed and then the results compared — as we did with 
the Dodgen/Trice data. 2 Needless to say, we are hoping that this experience is 
not repeated with our 9- and 10-piece calculations. 



2 We are aware of another effort to compute the 9-piece databases and (apparently) the 10-piece databases. 
We have made two offers to exchange information with this party so that the correctness of both of our efforts 
could be verified. The offers have been declined. 
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During the course of the calculations, we had to contend with a faulty CPU, 
bad memory, a disk crash, network errors and operator errors. In some cases, 
these errors were trivial to spot (dead disk), while others proved more sub- 
tle (faulty memory chip). Precautions were taken to reduce the likelihood of 
introducing an error into the computation: 

1 All calculations were logged. This was useful if a post-mortem was 
needed to identify the reason(s) for a computation failure. 

2 All data copied over a network was verified. The source and destination 
files had a cyclic redundancy check (CRC) value computed, and the two 
had to match. In practice, most copies worked correctly. However, at 
least once a month the CRC check would fail signaling a copy error. 

3 The database files were augmented with a 32-bit CRC number for each 
block of 1024 bytes. Whenever a disk read (local or over the network) 
was performed, the data read would be verified for consistency with the 
CRC number. This enhancement allowed us to find a subtle bug in the 
program, and occasionally would uncover a read failure that was not 
reported by the operating system. 

4 All data computed — databases in their original and compressed form — 
were archived to tape. Thus, if a catastrophic event occurred (e.g., an 
error was discovered in the early part of the computation), we would 
be able to recover by repairing the faulty data rather than having to re- 
compute it from scratch. The need to retrieve data from tape occurred 
only once. 

Despite all the above precautions, occasionally the computation of a database 
slice failed to verify, even though the logs showed no record of any error oc- 
curring. 

Are the databases correct? We do not know, but hope that someone will soon 
repeat our calculations and confirm our results. 

3.6 System Issues 

For the checkers computation, keeping many machines 100% busy is a dif- 
ficult task. It is complicated by the calculation dependencies (some databases 
must be computed before others), hardware specialization (run I/O-intensive 
jobs on machines with fast disks; run CPU-intensive jobs on machines with fast 
processors), and disk management (transferring files; making sure that disks 
do not fill up). We developed tools that can automate most of the computa- 
tion dependency and hardware specialization issues (Goldenberg et al., 2003). 
However, managing the data turned out to be labour intensive and a source of 
potential errors. We were unable to find or build a usable tool that could properly 
manage the data file dependencies, taking into account disk space constraints, 
in such a way as to maximize throughput. This appears to be a very difficult 
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problem, but one that needs to be solved if data-intensive computations are to 
be fully automated. 

4. Results 

This section discusses the results of computing all the 9-piece databases and 
the 5 pieces versus 5 pieces subset of the 10-piece databases. 

4.1 Computation 

Table 4 shows the sizes of the databases completed. 3 13.1 trillion positions 
have been computed. We claim that this is the largest endgame database (in 
terms of number of positions) yet computed for any game. 

The computation took 18 months. The 9-piece calculation began in Novem- 
ber 2001 and the 10-piece in January 2002. These computations ended in June 
2003. Most of the work was completed on dual-processor AMD machines. The 
memory used ranged from 1 to 4 GB. Older, slower (800 MHz) computers were 
used to pre-compute the captures phase of the computation. The lookups, non- 
captures, and verification phases were done using an average of 3 machines, 
with an average speed of 1.5 GHz. All phases used both processors to speed 
up the computation. 

We had infrequent access to a 64-processor SGI 03000 (500 MHz) with 32 
GB of RAM. The machine was used to run the non-captures phase of many 
of the largest database slices. The database program was parallelized using 
POSIX threads so that the range of positions could be equally divided between 
the processors and computed in parallel. The largest computation (171 billion 
positions) took 2.3 days of SGI time to resolve. The length of time was due 
to the relative slowness of the processors (500 MHz) and the number of passes 
over the data that were required to resolve all the positions. 

The total amount of computing done is difficult to estimate given that a vary- 
ing number of machines were used, with different number of processors, and 
with differing processor speeds. Normalized to a 1.5 GHz processor, a ballpark 
estimate is that the complete 2 through 9-piece databases and the 5 versus 5 
piece subset of the 10-piece databases required 15 CPU years of computing. 

Since a few of the 6 versus 4 piece database slices have been computed (low 
priority on a single machine), we could actually start computing the 11 -piece 
database (6 versus 5 subset). This computation is roughly 10-fold bigger (117 
trillion) than what has already been accomplished. We will not pursue this 
unless the 10-piece databases are insufficient for solving the game of checkers 
in a reasonable amount of time. 



3 Note that some 6 piece versus 4 piece slices have been computed. 
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Num Pieces 


Pieces/Side 


Size 


Total Completed 


1 


1-0 


120 


120 


2 


2-0 


3,484 






1-1 


3,488 


6,972 


3 


3-0 


65,192 






2-1 


196,032 


261,224 


4 


4-0 


883,458 






3-1 


3,546,384 






2-2 


2,662,932 


7,092,774 


5 


5-0 


9,237,424 






4-1 


46,409,320 






3-2 


93,041,488 


148,688,232 


6 


6-0 


77,526,288 






5-1 


467,999,856 






4-2 


1,174,279,692 






3-3 


783,806,128 


2,503,611,964 


7 


7-0 


536,417,856 






6-1 


3,782,903,904 






5-2 


11,404,950,960 






4-3 


19,055,258,760 


34,779,531,480 


8 


8-0 


3,118,957,920 






7-1 


25,172,147,520 






6-2 


88,657,111,920 






5-3 


177,982,456,720 






4-4 


111,378,534,401 


406,309,208,481 


9 


9-0 


15,455,930,880 






8-1 


140,531,639,040 






7-2 


566,442,589,440 






6-3 


1,328,448,083,840 






5-4 


1,997,749,399,776 


4,048,627,642,976 


10 


10-0 


65,975,569,920 






9-1 


0 


i 




8-2 


0 






7-3 


0 






6-4 


0 






5-5 


8,586,481,972,128 


8,652,457,542,048 


Total 






13,144,833,586,271 



Table 4 . Databases completed. 



4.2 Statistics 

Because of the concurrency used in the non-captures phase (2 processors 
would iterate on a slice in parallel), it is hard to know the exact number of 
ply required to resolve a slice. There were some slices that needed over 180. 
iterations to resolve, a lower bound that is probably very close to the actual 
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number. Consider what this number means. There were slices where over 180 
ply were needed before a capture could be forced or the leading checker could 
safely advance one square. In the latter case, one wonders how many more ply 
would be needed to win the game once that checker had safely advanced a single 
square — it could be huge! This gives rise to the speculation that there are 10- 
piece positions that may require many hundreds of ply to solve. For example, 
Gil Dodgen and Ed Trice have built a perfect-play 7-piece database, and they 
report the longest win (against best play) to be 253 ply (127 moves) (Trice and 
Dodgen, 2003). There must be 10-piece positions that are considerably longer 
than that. 

The previous discussion illustrates the disadvantage of computing only W/L/D 
values. Chinook could reach a 10-piece position and not know how to win it. 
The search could flounder, not being able to choose between winning moves to 
find a quick path to victory. The (real) danger is that the program will end up 
cycling around, not knowing how to make progress (although this has not been 
seen in practice). 

4.3 Solving Checkers 

The total possible search space for the game of checkers is 5 x 10 20 (see 
Table 5) — a daunting number. But how much of it has to be explored to solve 
checkers? Three assumptions can be used to get a rough upper bound on the 
effort required to solve checkers. The following heuristics are used to identify 
the key search space for the proof tree; parts that are excluded may be needed 
in the case of proving trivially won positions. 

■ Material Balance: An advantage of 2 or more pieces is huge; equivalent 
to roughly a rook or more in chess. It seems reasonable to assume that 
a proof would not have to go through positions with lop-sided material. 
The useful positions are those where the material balance is even, or one 
side has a single piece advantage. 

■ King Balance: One side having 3 or more kings than the other rarely 
occurs in practice. Hence we limit the search space to subsets where the 
number of kings for each side differs by at most 2. 

■ Number of Kings: Kings only appear on the board later in the game. 
For example, although it is theoretically possible to have 24 pieces on the 
board with one of them being a king, this scenario is highly contrived. A 
reasonable assumption is to limit the number of kings to being 6 when 
there are 10 or less pieces on the board, 4 with 12 or more pieces, 2 with 
14 or more pieces, and zero with 24 or less pieces. 

Table 5 shows the results of applying the above assumption. From O(10 20 ) 
the potential search space drops to O(10 14 ). Of this, the databases computed 
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| Pieces 


| Database Size 


Plausible Bound 


1 


120 


120 


2 


6,972 


3,488 


3 


261,224 


196,032 


4 


7,092,774 


2,662,932 


5 


148,688,232 


89,972,128 


6 


2,503,611,964 


759,865,120 


7 


34,779,531,480 


17,681,009,520 


8 


406,309,208,481 


103,706,534,351 


9 


4,048,627,642,976 


1,551,749,730,336 


10 


34,778,882,769,216 


5,862,356,551,488 


11 


259,669,578,902,016 


21,456,015,775,392 


12 


1,695,618,078,654,976 


46,262,266,685,096 


13 


9,726,900,031,328,256 


22,268,142,277,920 


14 


49,134,911,067,979,776 


29,879,692,089,280 


15 


218,511,510,918,189,056 


802,158,318,720 


16 


852,888,183,557,922,816 


723,777,011,100 


17 


2,905,162,728,973,680,640 


2,169,968,941,008 


18 


8,568,043,414,939,516,928 


1,527,822,346,512 


19 


21,661,954,506,100,113,408 


3,587,090,153,856 


20 


46,352,957,062,510,379,008 


1,959,596,777,424 


21 


82,459,728,874,435,248,128 


3,564,284,669,088 


22 


118,435,747,136,817,856,512 


1,489,690,180,992 


23 


1 29,406,908,049, 1 8 1 ,900,800 


2,057,391,420,240 


24 


90,072,726,844,888,186,880 


641,335,986,590 


| Total 


500,995,484,682,338,672,639 


145,925,579,158,733 



Table 5. Reducing the checkers search space. 



thus far represent roughly 7.5 trillion — 5% of the reduced search space. It is too 
early to know the full impact of the 10-piece databases in the checkers proof. 

5. Conclusions 

Disks are getting larger and cheaper; terabyte systems are affordable and 
petabyte systems exist. Moore’s law continues to hold and multi-processor 
systems are ubiquitous. RAM is inexpensive, and hardware and operating 
systems are gradually moving to accommodate large memories. In effect, there 
is no technological limit to pushing database technology to even greater heights. 
The endgame databases reported here contain over 10 13 data points, a 30-fold 
increase over what seemed possible a decade ago. High-end technology that is 
available today could be used to push this to 10 14 . 

The reason for computing the 10-piece databases was to solve the game of 
checkers. The databases eliminate the bottom of the search tree. A separate 
project is building the top of the proof tree, searching forward from the root 
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towards the databases. When the two search frontiers meet, checkers will be 
solved. At this point in time, it is too early to tell how soon this will happen. 
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Abstract Many research teams and individuals have computed endgame databases for 
the game of chess which use the distance-to-mate metric, enabling their 
software to forecast the number of moves remaining until the game is over. 
This is not the case for the game of checkers. Only one programming team has 
generated a checkers database capable of announcing the distance to the 
terminal position. This paper examines the benefits and detriments associated 
with computing three different types of checkers endgames databases, 
demonstrates the solutions to the longest wins in the 7-piece checkers 
database, presents tables of longest wins for positions including all 
permutations of four pieces and fewer against three pieces and fewer, and 
offers major improvements to some previously published play. 

Keywords: checkers, database, endgame, move to win, perfect play 



1. Introduction 

It is a widespread misconception that since the rules of the game of 
checkers are simple, so is the playing of the game. This misperception is not 
limited to just the general public. Several reputable scientific sources have 
disseminated inaccurate information regarding the state of computer 
checkers (Gibson, 1993; Schaeffer, 1997 pp. 101-102). 

“Computers became unbeatable in checkers several years ago.” Thomas Hoover, 
"Intelligent Machines,” Omni magazine, 1979, p. 162. 

“...an improved model of Samuel’s checkers-playing computer today is virtually 
unbeatable, even defeating checkers champions foolhardy enough to 'challenge' it to a 
game.” Richard Restak, The Brain, The Last Frontier , 1979, p. 336. 

“Although computers had long since been unbeatable at such basic games as checkers. . .” 
Clark Whelton, Horizon , February 1978. 
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“So whereas computers can 'crunch' tic-tac-toe, and even checkers, by looking all the way 

to the end of the game, they cannot do this with chess.” Lynn Steen, "Computer Chess: 

Mind vs. Machine" Science News , November 29, 1975. 

On August 29, 1992, World Checkers Champion Dr. Marion Tinsley 
defeated the world’s strongest checkers program, CHINOOK (Schaeffer, 1997 
pp. 328-332). The score of their match was 4 wins for Tinsley, 2 wins for 
CHINOOK, and 33 draws each. Tinsley’s four wins disproved the notion that 
checker programs were "unbeatable". 

The CHINOOK team experienced a great deal of success while ascending 
the competition rungs, which allowed them to challenge Dr. Tinsley for the 
title of “Man vs. Machine” World Champion in 1992. One of the key factors 
that made CHINOOK such a strong program was the size of its endgame 
databases. The program eventually had access to 443,748,401,247 pre- 
computed positions that were known to be either wins, losses, or draws 
(Lake, Schaeffer, and Lu, 1994). These data were available at runtime during 
the look-ahead search, which allowed CHINOOK to enter into lines of play 
that would avoid losses (within its horizon of search) as well discover deep, 
subtle wins. 

Having such game-theoretical values (GTV) available for the search 
engine at runtime is extremely valuable, but in certain cases it is not enough 
information to procure the win. Section 3.2 showcases some 7-piece 
positions that are wins for the side to move but cannot be won using only a 
database with the game-theoretical values stored. While such a database 
recognizes the wins as it builds its tree during the search, it cannot determine 
the winning sequence. A database with information associated with the 
distance until a conversion (capture of a piece, or promotion of a checker to 
a king) takes place is of some help. Examples are presented that demonstrate 
the power of such a Distance To Conversion database, as well as some of the 
weaknesses. 

This paper is organized as follows. Section 2 presents an overview of the 
three different types of checkers endgame databases, and briefly tabulates 
the pros and cons of each category. Section 3 contains the solution to the 
longest 7-piece database win and a comprehensive listing of all of the data 
collected in the 2-to-7-piece Perfect Play Lookup (PPL) databases. The 
longest wins are presented in each sub-database grouping. Section 4 
demonstrates several improvements to a very common checkers endgame 
known as Fourth Position. This ending was first published in 1756 and has 
been studied by the tournament checkers playing community ever since. It 
should be noted that the PPL database solution begins in such an unorthodox 
fashion that it is worthy of special attention. Section 5 offers a brief 
conclusion regarding what has been learned, particularly about the 
complexity of the game of checkers. 
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By improving upon the play of a common checkers endgame first 
published 247 years ago that has since been studied and analyzed by the 
strongest human players in the history of checkers, this paper asserts that the 
PPL database is capable of outperforming the world’s best human players 
from any time period. 

2. Overview of Different Types of Databases 

There are three different ways to catalog checkers information that can be 
useful to a program. Each type of database has benefits and drawbacks, 
which are summarized in Table 1. 



Database 


Benefits 


Drawbacks 


Game 

Theoretical 

Value 

(GTV) 


1) Easiest type of database to 
compute. 

2) Can be generated quickly. 

3) A post-process routine can 
compress the data efficiently 
allowing for runtime probing 
to assist the Alpha-Beta 
search engine. 


1) Once reaching a database 

position over the board, no 
information is available 

regarding the best way to 
proceed. 

2) Positions that are theoretically 
won can in practice be drawn by 
repetition since the winning path 
cannot be found. 


Distance 

To 

Conversion 

(DTC) 


1) Can be computed about as 
easily as a GTV database. 

2) In King-heavy endings the 
play can mirror PPL 
database results. 

3) A win can always be 
achieved, even if it takes 
much longer than a PPL 
database’s solution. 

4) Positions with draws as the 
result can be removed from 
the database. 


1) Requires much more RAM and 
disk space to compute compared 
to a GTV database. 

2) The database prefers a known 
conversion path which may take 
longer to win than a potentially 
much shorter path to victory 
known by a PPL database. 

3) With fewer kings on the board, 
playing precision is much lower 
than that of a PPL database. 

4) Post-process compression is not 
nearly as good as that of a GTV 
database. 


Perfect 

Play 

Lookup 

(PPL) 


1) Always wins by selecting the 
shortest possible route. 

2) Always capable of 

postponing losses for as long 
as possible. 

3) Positions with draws as the 
result can be removed from 
the database. 

4) The verification routine 
virtually guarantees that the 
GTV database used in the 
process is correct. 


1) Difficult to compute both in 
algorithm complexity and time 
requirements. 

2) Requires much more RAM and 
disk space to compute compared 
to a GTV database. 

3) Post-process compression is not 
nearly as good as that of a GTV 
database (but can be as good as 
that of a DTC database). 



Table 1. Benefits and drawbacks of the different types of checkers databases. 
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2.1 Application of a GTV Database 

Typically, a large database of game-theoretical- value information can be 
probed at run time, greatly reducing the number of nodes that need to be 
evaluated by a search engine. Once a database position is encountered in 
RAM, no additional move generation or merit assignments need to be 
invoked. The GTV score is returned, and that particular leaf node has perfect 
information attached to it. Schaeffer, Lake, Lu, and Bryant (1996) have 
demonstrated the benefits of this approach with their program CHINOOK, 
which won the World “Man versus Machine” Championship in 1994 and 
successfully defended its title in 1995. In probing such a database, the search 
tree can return valuable information while still a great distance away. The 
examples in Figures 1 to 3 demonstrate how a 12-piece position can be 
evaluated as a forced win with only a six-piece database accessible in RAM. 

From the position in Figure 1, a 
program with the black pieces will 
generate not only moves for 
evaluation such as 6-10 and 2-7, 
but those that seem to toss away 
material as well, such as 20-24 and 
14-17. The program will arrive at 
Figure 2 after Black initiated the 
sequence 14-17 21x14, 20-24 
28x19, 6-9 13x6, 2x9x18x27. With 
White to move, the 6-piece 
database result is a draw, so the 
score for Black to move from the 
parent 12-piece position is backed 
up as a draw. 

As other jump paths are examined, the position shown in Figure 3 will be 
reached after 14-17 21x14, 20-24 28x19, 6-9 13x6, 1x10x17x26. The 6- 
piece database position shown in Figure 3 is a loss for White to move, so the 
score for Black to move from the parent position is returned as a win. Since 
all captures are forced in the game of checkers, the program will elect to 
enter into the inescapable line of play leading to Figure 3. In so doing, the 
program will properly announce a win from the 12-piece position in Figure 
1, and play the move 14-17. 

A quick glance at Figures 2 and 3 will not reveal anything obvious to the 
casual player, but after conducting a search the correct result becomes 
evident. The most outstanding feature of the GTV information is that no 
such search ever needs to be performed once a position in the database is 
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Figure 1. Black to move wins by forcing a 
trade into a won six-piece database position. 
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found. This is the equivalent of “extending the search” from the database 
position to the terminal node many plies distant. 




Figure 2. White to move draws. Figure3. White to move loses. 

In the case of Figure 3, the PPL database indicates that White loses after 
a total of 102 plies. So, a GTV database, by identifying that position as a 
theoretical win, performed the functional equivalent of searching 102 plies 
and returning a score indicating a loss would occur. Notice that the number 
of pieces in a GTV database can be relatively small yet it can still effectively 
direct the search from a position with many more pieces. However, as will 
be shown in Subsection 3.2, once a program with only a GTV database is 
actually in a won position, in certain cases it can be sufficiently difficult to 
converge on the win. 

2.2 Creating a DTC Database 

A DTC database will store the number of plies for each position until 
either a checker crowns or a piece is captured. It does not have any 
information to distinguish which result is achieved as the goal; it only knows 
it is heading for one or the other. Unlike a GTV database, which can 
represent four positions in each byte during the computation (and five 
positions per byte after the computation prior to compression) the DTC 
database needs one entire byte for each position in order to store a 
conversion range from 0 to 255 plies. 

The process of creating the DTC database is nearly identical to the GTV 
database. If the DTC database is also being used as a GTV database, then 
two of the eight bits in the byte must be reserved for the win-loss-draw 
assessment, leaving only six bits (0-63) available for the maximum depth to 
conversion. There is a way to double this ply count if you divide the actual 
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depth by two, and take note that if the side to move wins, the plycount must 
be an odd number, since that side makes the last move. If the side to move 
loses, it makes the second to last move, which always must be an even 
number. Therefore, to “decompress” the true ply count, double the number 
that is stored, then add one to it if the position is a win. If it is a loss, there is 
no need to change the number. If it is a draw, the depth until conversion is 
meaningless. If the DTC database is created as a separate post-process, and 
the GTV database is used to determine the win-loss-draw status of a 
position, then all eight bits can be used to store conversion information. 
Using the aforementioned division-by-two schema a maximum conversion 
depth of 5 1 1 plies can be stored. 

The iteration process for the DTC database begins with the jump pass, 
which is the same as would be performed with a GTV database, with one 
notable difference. The DTC database stores the result for each jumping win 
as a “1-ply” conversion. Therefore the counter for each win in which a jump 
exists will be set to 1. Next, any crowning moves are generated, and 
independent of the win or loss result, these are stored as a conversion in 1 
ply as well. Thereafter, as each pass over the database is made, whenever a 
win or loss result is able to be determined, the iteration number is stored in 
the database. The idea is that more difficult positions (presumably) will have 
conversion iteration counts greater than positions that are near the 
conversion horizon. 

When the computation is completed, one goes on to the next slice, but the 
conversion information for the crowning moves or jumps is not inherited. 
Each database slice is computed independently of all others. No conversion 
information is shared across databases. 

2.3 Weaknesses of DTC Databases 

In difficult positions where there is a majority of Kings, a DTC database 
is most valuable. As more Checkers are introduced into a position, the 
probability that a DTC database will make the same, highly accurate move 
as the PPL database diminishes. Even in elementary positions like the one 
shown below in Figure 4, a DTC database will not play a move that is 
obvious to any ordinary player. 

Even novice checker players will make the move 18-23 in Figure 4, 
winning after White makes any move with the King, but a DTC database 
will not. 
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Figure 4. A DTC database will not play 
18-23, the move that wins most quickly. 



The glaring weakness of a DTC 
database centers on a potential “one 
ply conversion horizon” that can arise 
during a computation cycle. A move 
that converts in fewer plies but takes 
much longer to win could be preferred 
over a shorter win that takes longer to 
convert. With a quick glance anyone 
can see that in Figure 4 the sequence 
18-23 31-26 23x30 wins, as does 18- 
23 31-27 23x32. This requires three 
plies of information, which the PPL 
database does have, but which the 
DTC database does not have. The 



DTC database will “know” that 25-29 converts in 1 ply, and 25-30 converts 
in 1 ply. When the move 18-23 is examined, it will not lead to an immediate 
conversion, and therefore will be deemed to be inferior. The DTC database 
will have stored a value indicating that the position after 18-23 results in a 
conversion in 2 more plies. This conversion into the “2 against 0” database 
will win, of course, but the DTC has no information regarding the “distance 
to win”. 



2.4 Creating a PPL Database 



Unlike a DTC database, a PPL database stores complete information 
about the line of play leading all the way to the terminal (lost) position. It 
does this by backing up and storing the number of plies to win or lose for 
every won or lost position during the database generation process, starting 
with the 2-piece database. Like the DTC database, an entire byte is required 
for each position in order to store this move-to-win (MTW) information. 

Computation of a PPL database is much more difficult, both 
algorithmically and in terms of the amount of calculation required, than that 
of either a GTV or DTC database. This is due to the fact that one is not 
necessarily done once an MTW value has been assigned to a position. 
During the computation of a GTV or DTC database, once a GTV 
(win/loss/draw) or DTC (iteration count) value has been assigned to a 
position, no further computation is required. During the computation of a 
PPL database, the MTW values are subject to change from one iteration to 
the next. The MTV values are backed up through a tree of possible lines of 
play, and this tree dynamically changes as a function of the iteration depth. 
This process essentially amounts to a complex sorting procedure which 
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cannot be terminated until a pass is made over the entire database that 
produces no changes in any of the MTW values. 

The sorting procedure works as follows. In a won position all moves are 
generated, and the resulting positions that lose for the other side are queried 
for their MTW values (which may not yet exist or be reliable, depending on 
the position and the current iteration). The smallest value is then backed up 
into the parent position. This represents the move that will result in the 
quickest win. In a lost position all moves are generated, and the largest 
resulting MTW value is backed up into the parent position. (Recall that in a 
lost position all moves must lead to wins for the other side.) This represents 
the move that will result in the longest, most drawn-out loss. The database 
generation program makes repeated passes over the database slice being 
computed, and the sorting process continues until none of the MTW values 
changes. 

It should be noted that there are some lines of play that lead to wins with 
no “conversion” taking place, and this must be taken into account during the 
generation of the database. This situation occurs when a move is made that 
blocks the opponent so he cannot move. A loss in the game of checkers 
occurs when one side cannot move, due to not having any pieces remaining, 
or having pieces on the board that are blocked in such a way that no moves 
are available. The blocking move leading to the win may or may not involve 
a capture or a promotion (i.e., a conversion). 

3. The Longest 7-Piece Database Win 



Figure 5 contains the longest 7- 
piece database win. Below we provide 
the PPL database solution. 



Listing 1. The PPL database solution to the 
longest 7-piece win. 

8 - 11 , 12 - 8 , 1 - 5 , 9 - 13 , 10 - 14 , 8 - 3 , 11 - 16 , 13 - 9 , 14 - 18 , 

9 - 14 , 18 - 23 , 14 - 18 , 23 - 27 , 18 - 23 , 27 - 32 , 21 - 17 , 32 - 
28 , 17 - 22 , 28 - 24 , 22 - 26 , 24 - 20 , 23 - 27 , 16 - 19 , 26 - 31 , 
19 - 24 , 27 - 32 , 20 - 16 , 32 - 28 , 16 - 19 , 3 - 7 , 4 - 8 , 28 - 32 , 
19 - 16 , 31 - 26 , 16 - 20 , 7 - 10 , 8 - 11 , 26 - 31 , 5 - 9 , 10 - 6 , 9 - 
13 , 6 - 10 , 20 - 16 , 32 - 27 , 24 - 28 , 27 - 32 , 16 - 19 , 31 - 27 , 
11 - 8 , 27 - 24 , 19 - 23 , 10 - 14 , 8 - 11 , 24 - 20 , 11 - 15 , 20 - 24 , 
23 - 26 , 24 - 27 , 26 - 30 , 27 - 31 , 15 - 11 , 31 - 27 , 11 - 16 , 27 - 
23 , 30 - 25 , 14 - 18 , 13 - 17 , 18 - 14 , 25 - 21 , 23 - 26 , 16 - 19 , 

26 - 31 , 17 - 22 , 31 - 27 , 19 - 15 , 27 - 31 , 21 - 25 , 32 - 27 , 15 - 19 , 27 - 32 , 25 - 30 , 31 - 27 , 19 - 16 , 14 - 10 , 30 - 26 , 27 - 24 , 
22 - 25 , 10 - 15 , 26 - 22 , 24 - 20 , 16 - 12 , 15 - 19 , 25 - 30 , 19 - 15 , 30 - 26 , 20 - 24 , 22 - 17 , 15 - 10 , 12 - 16 , 10 - 15 , 17 - 13 , 
24 - 20 , 16 - 12 , 15 - 11 , 13 - 9 , 20 - 24 , 26 - 31 , 24 - 20 , 9 - 6 , 11 - 15 , 6 - 2 , 15 - 11 , 31 - 26 , 11 - 15 , 2 - 7 , 15 - 18 , 7 - 10 , 
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Figure 5. Black to move wins in 253 
plies with 8-11. 
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20-24, 26-31, 24-19, 12-8, 18-15, 10-7, 15-18, 8-11, 18-14, 31-26, 14-18, 7-10, 19-24, 10-15, 18-14, 26- 
23, 14-9, 23-19, 9-5, 11-7, 5-9, 7-10, 24-20, 15-11, 20-24, 19-15, 9-5, 10-6, 5-1, 6-2, 24-20, 15-19, 1-5, 
19-16, 20-24, 16-20, 24-27, 2-6, 5-1, 6-10, 1-5, 11-16, 5-9, 10-15, 9-14, 15-19, 27-31, 20-24, 31-26, 19- 
15, 26-22, 24-20, 14-9, 16-19, 9-13, 20-24, 22-17, 19-23, 17-21, 24-19, 13-17, 23-18, 17-13, 18-22, 21- 
17, 22-26, 13-9, 19-23, 9-5, 26-30, 5-1, 23-26, 17-13, 15-19, 13-9, 19-23, 9-5, 23-27, 32-23, 26-19, 5-9, 
30-26, 1-5, 26-22, 9-14, 28-32, 5-1, 32-27, 1-5, 27-23, 5-1, 22-18,14-10, 18-15, 10-6, 23-26, 6-2, 19-23, 
2-6, 26-22, 1-5, 23-18, 6-9, 15-10, 9-13, 18-14, 13-9, 22-18, 9-13, 10-6, 5-1, 14-10, 13-17, 10-14, 17-10, 
6-15, 1-5, 18-14, 5-1, 14-10, 1-5, 10-6, 5-1, 15-10, 1-5, 6-1, 5-9, 1-5, 9-13, 10-15, 13-17, 15-18, 17-13, 
18-22, 13-9, 5-14 

3.1 Perfect Play Data 

Table 2 lists statistics on all of the 2-to-7 piece database slices that were 
solved with four or fewer pieces for one side and Black to move. In it is 
shown the total number of positions as a function of the database slice, the 
number of plies associated with the longest win and loss for each slice, and 
one position for a longest win for Black to move. In some cases, as the 
material distribution becomes more dominant for one side, the longest win 
features a position with a forced jump for the weaker side to move. After a 
bit of reflection, this result makes sense. With the strong side to move, the 
win will precipitate very quickly. The weak side to move will execute a 
jump, perhaps equalizing or even surpassing the material of the former 
strong side of the board, then the total number of plies to win from the 
resulting sub-database will substantially add to the length of the game. In the 
“Position” column, BK = Black King, WK = White King, BC = Black 
Checker, WC = White Checker. 



Material Distribution Total Positions Longest Win/Loss Position 



1K + 0C vs. 1K + 0C 


992 


11/10 


BK: 4; WK: 29 


1K + 0C vs. OK + 1C 


868 


11/10 


BK: 32; WC: 20 


OK + 1C vs. 1K + 0C 


868 


5/12 


BC: 14; WK: 26 


OK + 1C vs. OK + 1C 


760 


13/12 


BC: 25; WC: 30 


2K + 0C vs. 1K + 0C 


14,880 


33/34 


BK: 1,2; WK: 19 


2K + 0C vs. OK + 1C 


13,020 


33/34 


BK: 1,2; WC: 19 


1K+ 1C vs. 1K + 0C 


26,040 


47/48 


BK: 32; BC: 4; WK: 23 


IK + 1C vs. OK + 1C 


22,800 


47/48 


BK: 32; BC: 4; WC: 15 


0K + 2C vs. 1K + 0C 


11,340 


61/62 


BC: 3,4; WK: 26 


0K + 2C vs. 0K+ 1C 


9,936 


61/62 


BC: 3,4; WC: 26 


2K + 0C vs. 2K + 0C 


215,760 


49/48 


BK: 26,30; WK: 29,31 


2K + 0C vs. 1K+ 1C 


377,580 


95/94 


BK: 2,3; WK: 21; WC: 25 


2K + 0C vs. 0K + 2C 


164,430 


89/92 


BK: 28,31; WC: 6,30 


IK + 1C vs. IK + 1C 


661,200 


103/102 


BK: 28; BC: 18; WK: 3; WC: 29 


IK + 1C vs. 0K + 2C 


288,144 


107/108 


BK: 28; BC: 4; WC: 27,30 


OK + 2C vs. OK + 2C 


125,664 


109/108 


BC: 4,24; WC: 29,30 
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3K + 0C vs. 1K + 0C 


143,840 


29/30 


BK: 7,16,29; WK: 11 


3K + 0C vs. OK + 1C 


125,860 


27/28 


BK: 28,31,32; WC: 19 


2K+lCvs. 1K + 0C 


377,580 


41/38 


BK: 7,16; BC: 8; WK: 3 


2K+lCvs. OK + 1C 


330,600 


37/32 


BK: 29,30; BC: 25; WC: 31 


1K + 2C vs. 1K + 0C 


328,860 


53/54 


BK: 19; BC: 9,10; WK: 15 


1K + 2C vs. OK + 1C 


288,144 


41/42 


BK: 32; BC: 9,14; WC: 13 


0K + 3C vs. 1K + 0C 


95,004 


59/58 


BC: 4,7,8; WK: 3 


0K + 3C vs. OK + 1C 


83,304 


55/56 


BC: 7,8,11; WC: 12 


3K + 0C vs. 2K + 0C 


2,013,760 


67/68 


BK: 8,29,30; WK: 12,18 


3K + 0C vs. 1K+ 1C 


3,524,080 


89/90 


BK: 24,16,12; WK: 20; WC: 29 


3K + OC vs. OK + 2C 


1,534,680 


81/62 


BK: 25,26,29; WC: 9,30 


2K+lCvs. 2K + 0C 


5,286,120 


147/148 


BK: 4,29; BC: 5; WK: 26,30 


2K+lCvs. IK + 1C 


9,256,800 


139/140 


BK: 4,30; BC: 5; WK: 22; WC: 10 


2K+lCvs. 0K + 2C 


4,034,016 


93/88 


BK: 7,26; BC: 16; WC: 11,30 


1K + 2C vs. 2K + 0C 


4,604,040 


149/148 


BK: 4; BC: 5,25; WK: 30,31 


1K + 2C vs. IK + 1C 


8,068,032 


159/160 


BK: 8; BC: 5,9; WK: 10; WC: 31 


1K + 2C vs. 0K + 2C 


3,518,592 


111/140 


BK: 28; BC: 4,8; WC: 7,12 


OK + 3C vs. 2K + OC 


1,330,056 


155/154 


BC: 1,3,4; WK: 5,26 


0K + 3C vs. IK + 1C 


2,332,512 


161/162 


BC: 1,4,5; WK: 14; WC: 24 


OK + 3C vs. OK + 2C 


1,018,056 


155/160 


BC: 5,7,9; WC: 6,26 


4K + 0C vs. IK + OC 


1,006,880 


29/30 


BK: 9,17,26,27; WK: 22 


4K + 0C vs. OK + 1C 


881,020 


23/24 


BK: 4,28,29,32; WC: 23 


3K+lCvs. IK + OC 


3,524,080 


29/30 


BK: 17,26,27; BC: 9; WK: 22 


3K+lCvs. OK + 1C 


3,085,600 


25/26 


BK: 28,31,32; BC: 24; WC: 19 


2K + 2C vs. IK + OC 


4,604,040 


37/38 


BK: 31,32; BC: 27,28; WK: 24 


2K + 2C vs. OK + 1C 


4,034,016 


31/28 


BK: 31,32; BC: 27,28; WC: 30 


1K + 3C vs. IK + OC 


2,660,112 


43/44 


BK: 26; BC: 4,11,19; WK: 23 


1K + 3C vs. OK + 1C 


2,332,512 


39/40 


BK: 4; BC: 7,8,11; WC: 12 


0K + 4C vs. IK + OC 


573,300 


51/52 


BC: 7,8,11,15; WK: 12 


0K + 4C vs. OK + 1C 


503,100 


49/50 


BC: 4,7,8,11; WC: 12 


3K + 0C vs. 3K + 0C 


18,123,840 


73/74 


BK: 3,12,23; WK: 16,31,32 


3K + 0C vs. 2K+1C 


47,575,080 


147/146 


BK: 3,8,15; WK: 7,22; WC: 28 


3K + 0C vs. 1K + 2C 


41,436,360 


151/150 


BK: 1,8,15; WK: 7; WC: 28,29 


3K + 0C vs. 0K + 3C 


11,970,504 


149/150 


BK: 13,26,28; WC: 5,14,29 


2K+lCvs. 3K + 0C 


47,575,080 


147/146 


BK: 11,26; BC: 5; WK: 18,25,30 


2K+ 1C vs. 2K+ 1C 


124,966,800 


153/152 


BK: 10,19; BC: 1; WK: 2,6; WC: 31 


2K+lCvs. 1K + 2C 


108,918,432 


161/162 


BK: 25,18; BC: 1; WK: 17; WC: 24,28 


2K+lCvs. 0K + 3C 


31,488,912 


155/160 


BK: 15,27; BC: 22; WC: 23,28,32 


1K + 2C vs. 3K + 0C 


41,436,360 


151/150 


BK: 26; BC: 4,5; WK: 18,25,32 


1K + 2C vs. 2K+1C 


108,918,432 


161/162 


BK: 16; BC: 5,9; WK: 8,15; WC: 32 


1K + 2C vs. 1K + 2C 


95,001,984 


167/166 


BK: 25; BC: 5,9; WK: 18; WC: 17,30 


1K + 2C vs. 0K + 3C 


27,487,512 


163/164 


BK: 14; BC: 5,6; WC: 25,19,12 


OK + 3C vs. 3K + OC 


11,970,504 


149/150 


BC: 4,19,28; WK: 5,7,20 


0K + 3C vs. 2K+1C 


31,488,912 


155/160 


BC: 1,5,10; WK:6, 18; WC: 11 


0K + 3C vs. 1K + 2C 


27,487,512 


163/164 


BC: 8,14,21; WK: 19; WC: 27,28 


OK + 3C vs. OK + 3C 


79,59,904 


161/162 


BC: 1,2,3; WC: 14,17,19 
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4K + OC vs. 2K + OC 


13,592,880 


67/68 


BK: 4,12,29,30; WK: 11,22 


4K + 0C vs. 1K+1C 


23,787,540 


87/88 


BK: 9,10,19,27; WK: 15; WC: 30 


4K + OC vs. OK + 2C 


10,359,090 


51/44 


BK: 17,19,26,27; WC: 20,31 


3K+lCvs. 2K + 0C 


47,575,080 


135/114 


BK: 9,17,29; BC: 5; WK: 13,17 


3K+ 1C vs. 1K+ 1C 


83,311,200 


91/88 


BK: 17,25,26; BC: 19; WK: 20; WC: 29 


3K+lCvs. 0K + 2C 


36,306,144 


95/66 


BK: 19,25,27; BC: 17; WC: 30,31 


2K + 2C vs. 2K + OC 


62,154,540 


147/138 


BK: 14,29; BC: 5,26; WK: 30,31 


2K + 2C vs. 1K+ 1C 


108,918,432 


143/140 


BK: 26,27; BC: 5,13; WK: 14; WC: 31 


2K + 2C vs. OK + 2C 


47,500,992 


99/80 


BK: 17,19; BC: 4,26; WC: 30,31 


1K + 3C vs. 2K + 0C 


35,911,512 


149/150 


BK: 26; BC: 5,6,7; WK: 1,31 


1K + 3C vs. 1K+ 1C 


62,977,824 


153/154 


BK: 9; BC: 4,5,8; WK: 10; WC: 13 


1K + 3C vs. 0K + 2C 


27,487,512 


109/146 


BK: 28; BC: 4,8,11; WC: 7,12 


OK + 4C vs. 2K + OC 


7,739,550 


155/156 


BC: 1,4,8,18; WK: 10,26 


0K + 4C vs. IK + 1C 


13,583,700 


153/154 


BC: 1,6,16,19; WK: 14; WC: 20 


0K + 4C vs.0K + 2C 


5,933,850 


153/148 


BC: 5,7,10,14; WC: 15,19 


0K + 2C vs. 0K + 4C 


5,933,850 


153/148 


BC: 5,7,10,14; WC: 15,19 


4K + OC vs. 3K + OC 


117,804,960 


113/114 


BK: 3,4,5,26; WK: 1,11,15 


4K + 0C vs. 2K+1C 


309,238,020 


149/142 


BK: 2,29; BC: 5; WK: 6,7,8,30 


4K + 0C vs. 1K + 2C 


269,336,340 


149/148 


BK: 3; BC: 5,15; WK: 6,18,25,30 


4K + OC vs. OK + 3C 


77,808,276 


151/150 


BC: 5,9,11; WK: 16,24,28,29 


3K+ 1C vs. 3K + 0C 


412,317,360 


207/208 


BK: 21,28,30; BC: 3; WK: 1,22,32 


3K+lCvs. 2K+1C 


1,083,045,600 


201/202 


BK: 1,29,30; BC: 24; WK: 31,27; WC: 11 


3K+ 1C vs. 1K + 2C 


943,959,744 


153/144 


BK: 10; BC: 4,5; WK: 14,18,25; WC: 9 


3K+ 1C vs. 0K + 3C 


272,903,904 


159/158 


BC: 3,5,9; WK: 8,16,18; WC: 31 


2K + 2C vs. 3K + OC 


538,672,680 


245/246 


BK: 4,11; BC: 2,5; WK: 3,10,29 


2K + 2C vs. 2K+1C 


1,415,939,616 


241/240 


BK: 4,32; BC: 5,8; WK: 17,23; WC:12 


2K + 2C vs. 1K + 2C 


1,235,025,792 


191/192 


BK: 5,27; BC: 12,20; WK: 19; WC: 11,32 


2K + 2C vs. OK + 3C 


357,337,656 


161/166 


BC: 2,5,12; WK: 3,16; WC: 9,31 


1K + 3C vs. 3K + 0C 


311,233,104 


249/248 


BK: 6; BC: 1,18,15; WK: 5,14,16 


1K + 3C vs. 2K + 1C 


818,711,712 


253/252 


BK: 4; BC: 1,8,10; WK: 9,21; WC: 12 


1K + 3C vs. 1K + 2C 


714,675,312 


237/238 


BK: 5; BC: 7,8,9; WK: 27; WC: 6,19 


1K + 3C vs.0K + 3C 


206,957,504 


183/198 


BK: 29; BC: 5,7,8; WC: 12,24,30 


OK + 4C vs. 3K + OC 


67,076,100 


233/230 


BC: 2,4,15,27; WK: 5,16,32 




176,588,100 


249/248 


BC: 1,2, 4,6; WK: 28,32; WC: 27 


0K + 4C vs. 1K + 2C 


154,280,100 


243/242 


BC: 4, 5, 6, 8; WK: 28; WC: 12,27 


OK + 4C vs. OK + 3C 


44,717,500 


209/210 


BC: 4,5,7,11; WC: 6,19,32 



Table 2. Positions with the longest solutions for the 2- to-7-piece databases. 

3.2 Difficult Theoretical Wins 

It is possible to be in a position that is a theoretical win that is too 
difficult for a program to win even while consulting a GTV database (see 
Figures 6 and 7). This is due to several factors; we mention three of them. 

1. The program will never make a move that loses or gives away the draw 
on any given turn as it consults the GTV databases, but sometimes every 
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2. 



3. 



move in a position will win. The GTV database is of no help in reducing 
the size of the game tree in these instances. 

The complete path to the win might not require the use of any of the 
intermediate goals in the evaluation function of the checkers program, so 
as long as the solution is beyond the horizon of the search, the win could 
be postponed. 

King-heavy positions will produce principal variations in which the 
Kings tend to wander and gain nothing if forced trades can be avoided. 




Figure 6. A longest conversion win Figure 7. A longest win with 3 Kings and 
with 3 Kings and 1 Checker versus 1 Checker versus 3 Kings. Moving the 

3 Kings. Black to move can force a Checker on square 3 will result in a Draw, 

trade after 149 plies, starting with the Only 28-24 will win. 
move 27-24. 

Figure 6 comes from the DTC database of Murray Cash (Cash and 
Miller, 2002) and is listed as the longest conversion win in the 7-piece 
database in which a Checker remains unmoved. Comparing this position to 
Figure 19, page 248 from Schaeffer (1997) we note that Schaeffer had two 
of the white Kings on squares 12 and 15 instead of 16 and 19. The Schaeffer 
position and the Cash position both require 207 plies to win. 

Figure 7 is from the Dodgen-Trice PPL database, showing another 
“longest win” possible in the same database slice as the position in Figure 6. 
Listings 2 and 3 show how the PPL database will play each position. 

Listing 2. The PPL database solution to Figure 6. 



27 - 24 , 19 - 15 , 24 - 20 , 16 - 19 , 21 - 25 , 15 - 18 , 25 - 30 , 18 - 15 , 20 - 24 , 19 - 23 , 7 - 2 , 15 - 10 , 30 - 25 , 10 - 14 , 25 - 22 , 

14 - 10 , 24 - 20 , 23 - 19 , 22 - 25 , 19 - 15 , 20 - 16 , 15 - 18 , 16 - 12 , 10 - 15 , 2 - 6 , 15 - 19 , 6 - 9 , 19 - 23 , 9 - 13 , 18 - 14 , 25 - 
21 , 14 - 10 , 13 - 17 , 10 - 15 , 21 - 25 , 15 - 19 , 25 - 30 , 19 - 24 , 17 - 22 , 24 - 19 , 22 - 26 , 23 - 27 , 26 - 31 , 27 - 24 , 30 - 26 , 19 - 
15 , 26 - 22 , 24 - 20 , 31 - 26 , 20 - 24 , 22 - 17 , 15 - 10 , 12 - 16 , 10 - 15 , 17 - 13 , 24 - 20 , 16 - 12 , 15 - 11 , 13 - 9 , 20 - 24 , 26 - 
31 , 24 - 20 , 9 - 6 , 11 - 15 , 6 - 2 , 15 - 11 , 31 - 26 , 11 - 15 , 2 - 7 , 15 - 18 , 7 - 10 , 20 - 24 , 26 - 31 , 24 - 19 , 12 - 8 , 18 - 15 , 10 - 7 , 

15 - 18 , 8 - 11 , 18 - 14 , 31 - 26 , 14 - 18 , \ 7 - 10 , 19 - 24 , 10 - 15 , 18 - 14 , 26 - 23 , 14 - 9 , 23 - 19 , 9 - 5 , 11 - 7 , 5 - 9 , 7 - 10 , 24 - 
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20 , 15 - 11 , 20 - 24 , 19 - 15 , 9 - 5 , 10 - 6 , 5 - 1 , 6 - 2 , 24 - 20 , 15 - 19 , 1 - 5 , 19 - 16 , 20 - 24 , 16 - 20 , 24 - 27 , 2 - 6 , 5 - 1 , 6 - 10 , 
1 - 5 , 11 - 16 , 5 - 9 , 10 - 15 , 9 - 14 , 15 - 19 , 27 - 31 , 20 - 24 , 31 - 26 , 19 - 15 , 26 - 22 , 24 - 20 , 14 - 9 , 16 - 19 , 9 - 13 , 20 - 24 , 
22 - 17 , 19 - 23 , 17 - 21 , 24 - 19 , 13 - 17 , 23 - 18 , 17 - 13 , 18 - 22 , 21 - 17 , 22 - 26 , 13 - 9 , 19 - 23 , 9 - 5 , 26 - 30 , 5 - 1 , 23 - 
26 , 17 - 13 , 15 - 19 , 13 - 9 , 19 - 23 , 9 - 5 , 23 - 27 , 32 - 23 , 26 - 19 , 5 - 9 , 30 - 26 , 1 - 5 , 26 - 22 , 9 - 14 , 28 - 32 , 5 - 1 , 32 - 27 , 

I - 5 , 27 - 23 , 5 - 1 , 22 - 18 , 14 - 10 , 18 - 15 , 10 - 6 , 23 - 26 , 6 - 2 , 19 - 23 , 2 - 6 , 26 - 22 , 1 - 5 , 23 - 18 , 6 - 9 , 15 - 10 , 9 - 13 , 18 - 
14 , 13 - 9 , 22 - 18 , 9 - 13 , 10 - 6 , 5 - 1 , 14 - 10 , 13 - 17 , 10 - 14 , 17 - 10 , 6 - 15 , 1 - 5 , 18 - 14 , 5 - 1 , 14 - 10 , 1 - 5 , 10 - 6 , 5 - 1 , 
15 - 10 , 1 - 5 , 6 - 1 , 5 - 9 , 1 - 5 , 9 - 13 , 10 - 15 , 13 - 17 , 15 - 18 , 17 - 13 , 18 - 22 , 13 - 9 , 5-14 

Listing 3. The PPL database solution to Figure 7. 

28 - 24 , 1 - 6 , 24 - 19 , 6 - 10 , 19 - 23 , 10 - 6 , 3 - 7 , 6 - 2 , 7 - 11 , 2 - 7 , 11 - 15 , 22 - 26 , 15 - 19 , 26 - 22 , 19 - 24 , 22 - 26 , 23 - 19 , 

26 - 31 , 24 - 28 , 7 - 11 , 19 - 24 , 32 - 27 , 24 - 20 , 27 - 32 , 21 - 17 , 31 - 27 , 30 - 25 , 27 - 31 , 17 - 14 , 11 - 15 , 20 - 16 , 31 - 27 , 
14 - 17 , 15 - 18 , 25 - 30 , 18 - 23 , 17 - 22 , 27 - 31 , 16 - 12 , 31 - 27 , 22 - 26 , 23 - 19 , 26 - 31 , 27 - 24 , 30 - 26 , 19 - 15 , 26 - 22 , 
24 - 20 , 31 - 26 , 20 - 24 , 22 - 17 , 15 - 10 , 12 - 16 , 10 - 15 , 17 - 13 , 24 - 20 , 16 - 12 , 15 - 11 , 13 - 9 , 20 - 24 , 26 - 31 , 24 - 20 , 
9 - 6 , 11 - 15 , 6 - 2 , 15 - 11 , 31 - 26 , 11 - 15 , 2 - 7 , 15 - 18 , 7 - 10 , 20 - 24 , 26 - 31 , 24 - 19 , 12 - 8 , 18 - 15 , 10 - 7 , 15 - 18 , 8 - 
11 , 18 - 14 , 31 - 26 , 14 - 18 , 7 - 10 , 19 - 24 , 10 - 15 , 18 - 14 , 26 - 23 , 14 - 9 , 23 - 19 , 9 - 5 , 11 - 7 , 5 - 9 , 7 - 10 , 24 - 20 , 15 - 
11 , 20 - 24 , 19 - 15 , 9 - 5 , 10 - 6 , 5 - 1 , 6 - 2 , 24 - 20 , 15 - 19 , 1 - 5 , 19 - 16 , 20 - 24 , 16 - 20 , 24 - 27 , 2 - 6 , 5 - 1 , 6 - 10 , 1 - 5 , 

II - 16 , 5 - 9 , 10 - 15 , 9 - 14 , 15 - 19 , 27 - 31 , 20 - 24 , 31 - 26 , 19 - 15 , 26 - 22 , 24 - 20 , 14 - 9 , 16 - 19 , 9 - 13 , 20 - 24 , 22 - 
17 , 19 - 23 , 17 - 21 , 24 - 19 , 13 - 17 , 23 - 18 , 17 - 13 , 18 - 22 , 21 - 17 , 22 - 26 , 13 - 9 , 19 - 23 , 9 - 5 , 26 - 30 , 5 - 1 , 23 - 26 , 
17 - 13 , 15 - 19 , 13 - 9 , 19 - 23 , 9 - 5 , 23 - 27 , 32 - 23 , 26 - 19 , 5 - 9 , 30 - 26 , 1 - 5 , 26 - 22 , 9 - 14 , 28 - 32 , 5 - 1 , 32 - 27 , 1 - 5 , 

27 - 23 , 5 - 1 , 22 - 18 , 14 - 10 , 18 - 15 , 10 - 6 , 23 - 26 , 6 - 2 , 19 - 23 , 2 - 6 , 26 - 22 , 1 - 5 , 23 - 18 , 6 - 9 , 15 - 10 , 9 - 13 , 18 - 14 , 
13 - 9 , 22 - 18 , 9 - 13 , 10 - 6 , 5 - 1 , 14 - 10 , 13 - 17 , 10 - 14 , 17 - 10 , 6 - 15 , 1 - 5 , 18 - 14 , 5 - 1 , 14 - 10 , 1 - 5 , 10 - 6 , 5 - 1 , 15 - 
10 , 1 - 5 , 6 - 1 , 5 - 9 , 1 - 5 , 9 - 13 , 10 - 15 , 13 - 17 , 15 - 18 , 17 - 13 , 18 - 22 , 13 - 9 , 5-14 



Listings 4 and 5 show how a program with a GTV database on the strong 
side, searching to a depth of 31 plies for each move, will still allow 
repetition draws against a PPL database defending on the weak side. 
Appendix A contains the proper play for the strong side at each footnote, 
given in [ ] below. 

Listing 4. The PPL database defends the weak side of Figure 6 against a GTV database on 
the winning side, and a draw ensues via repetition. 

27 - 24 , 19 - 15 , 24 - 20 , 16 - 19 , 21 - 25 , 15 - 18 , 20 - 24 , [1] 19 - 16 , 24 - 20 , [2] 16 - 19 , 25 - 30 , 18 - 15 , 07 - 02 , 19 - 23 , 
20-24 [3] 15 - 10 , 30 - 25 , 10 - 14 , 25 - 22 , 14 - 10 , 22 - 17 , 23 - 18 , 24 - 19 , [4] 18 - 15 , 19 - 23 , [5] 15 - 11 , 23 - 19 , [6] 11 - 
15 , 19 - 16 , 15 - 18 , 16 - 20 , [7] 18 - 15 , 17 - 21 , [8] 15 - 11 , 21 - 25 , 11 - 15 , 20 - 16 , 15 - 18 , 16 - 19 , [9] 18 - 15 , 19 - 23 , [W] 
15 - 11 , 23 - 19 , 11 - 15 , 19 - 24 , [11] 15 - 18 , 25 - 30 , [12] 18 - 23 , 24 - 27 , [13] 23 - 18 , 27 - 24 , 18 - 23 , 24 - 27 , [14] 23 - 18 , 
27 - 24 , 18 - 23 , 24 - 27 , [15] 23 - 18 , 27 - 24 , 18 - 23 , 24 - 27 , [16] 23 - 18 , repetition draw. 

Listing 5. The PPL database defends the weak side of Figure 7 against a GTV database on 
the winning side, and a different kind of draw is reached. The positions will repeat in a cycle 
every 66 plies. 



28 - 24 , 01 - 06 , 24 - 19 , 06 - 10 , 19 - 23 , 10 - 06 , 03 - 07 , 06 - 02 , 07 - 11 , 02 - 07 , 11 - 15 , 22 - 26 , 15 - 19 , 26 - 22 , 19 - 24 , 
22 - 26 , 23 - 18 , [17] 26 - 31 , 18 - 15 , 32 - 28 , 15 - 19 , 28 - 32 , 24 - 28 , 07 - 11 , 19 - 24 , 32 - 27 , 24 - 20 , 27 - 32 , 21 - 17 , 
31 - 27 , 30 - 25 , 27 - 31 , 25 - 22 , [18] 31 - 27 , 17 - 14 , [19] 11 - 15 , 22 - 17 , [20] 27 - 23 , 17 - 13 , 23 - 27 , 13 - 09 , [21] 15 - 19 , 
14 - 17 , 19 - 23 , 17 - 22 , 23 - 19 , 09 - 06 , [22] 19 - 15 , 22 - 17 , [23] 15 - 19 , 06 - 09 , [24] 19 - 23 , 09 - 13 , 23 - 19 , 17 - 21 , 19 - 
23 , 13 - 17 , 23 - 19 , 17 - 14 , [25] 19 - 15 , 14 - 09 , [26] 15 - 18 , 21 - 17 , 18 - 23 , 09 - 13 , [27] 23 - 19 , 17 - 22 , 19 - 15 , 22 - 
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25, 1281 15-11, 13-09, 1291 11-15, 20-16, 15-18, 16-11, 27-23, 09-13, 23-19, 13-17, 18-23, 25-22, [30i 19-24, 
17-14, 1311 23-19, 22-17, 1321 19-23, 17-22, 23-19, 22-25, 1331 24-20, 11-08, 20-16, 08-12, 16-20, 14-18, 1341 
20-24, 25-22, 1351 24-27, 22-17, 1361 27-24, 18-14, 1371 24-20, 14-10, 1381 19-23, 17-22, 20-24, 12-08, 1391 23- 
19, 08-11, 1401 24-20, 22-17, [41] 19-23, 17-22, 23-19, 10-06, 19-23, 06-09, [42) 23-19, 11-08, 19-23, 09-14, 
20-16, 08-12, 16-20, 14-10, 1431 20-24, 12-08, 1441 23-19, 08-11, [45] 24-20, 22-17, i46) 19-23, 17-22, 23-19, 
10-07, 1471 20-24, 11-08, 24-27, 07-02, 19-15, 02-06, 27-23, 08-12, 15-19, 06-10, 23-27, 10-14, 27-23, 22- 
18, 1481 23-27, 14-17, 27-24, 17-22, 24-27, 12-08, 1491 19-23, 18-14, 23-19, 08-11, 1501 27-23, 22-17, 1511 19- 
24, 17-22, 23-19, 14-17, 24-20, 11-08, 19-23, 17-14, 1521 20-16, 08-12, 16-20, 14-10, 1531 20-24, 12-08, 1541 
23-19, 08-11, 1551 24-20, 22-17, 1561 19-23, 17-22, 23-19, 10-06, 19-23, 06-09, 1571 23-19, 11-08, 1581 19-23, 
09-14, 20-16, 08-12, 16-20, 14-10, 1591 20-24, 12-08, 1601 23-19, 08-11, [61) 24-20, 22-17, [62) 19-23, 17-22, 
23-19, 10-07, 1631 20-24, 11-08, 24-27, infinite cycle of repetition. 

It is interesting to note that after move 12 in Listing 5, which is from 
Figure 7, the same type of position as Figure 6 is created; i.e., one in which 
the Checker cannot crown since it is being blocked by an enemy King. Even 
with this common theme, two different types of draws result. In Listing 4, a 
“see-saw” draw occurs when the hash table saturates and moves leading to 
the win have all been played before. Recall one of the uses of the hash table 
is to score repeated moves of the same King as a draw, so that you do not 
shuffle the same piece back and forth twenty times and believe you have 
conducted a valid 40 ply search (20 for one side, 20 for the other). Likewise, 
arriving at the same position many times during the search via transposition 
without making progress should be discouraged. The program ends up in the 
undesirable situation where most or all of the winning lines are found in the 
hash table but none force the final simplifying win. They all appear to lead to 
“no progress” due to their high frequency of occurrence in the hash table, yet 
they are the only subset of moves that will win. In Listing 5, a lengthy cycle 
from moves 55 to 88 could theoretically repeat ad infinitum, starting at move 
89. Without being able to search at least 67 plies into the future from move 
55, this cycle cannot be avoided. 

3.3 GTV Database Program vs. PPL Database Program 

An experiment was performed to observe how two different 
Grandmaster-level programs (Waldteufel, 2002; Gilbert, 2002) would play 
against the PPL database from a “longest win” test position. In each case, the 
World Championship Checkers (WCC) program (Dodgen and Trice, 
2001) with the PPL database played the losing side of each ending, and both 
the WYLLIE program and KlNGSROW program played the winning side. All 
of the programs had access to a GTV database probed in RAM during the 
search that contained at least all of the 19,055,258,760 7-piece positions 
featuring four against three. The WCC program consulted the PPL database 
when defending the weak side on every move. The starting position for each 
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game was the position shown in Figure 5. In this position, Black to move can 
win in 253 plies. 

Listing 6. Wyllie program vs. WCC, February 7, 2003. WCC was able to draw from the 
losing starting position shown in Figure 5. 

08-11, 12-08, 01-05, 09-13, 10-14, 08-03, 11-16, 13-09, 14-18, 09-14, 18-23, 14-18, 23-27, 18-23, 27-32, 

21- 17, 32-28, 17-22, 28-24, 22-26, 24-20, 23-27, 16-19, 26-31, 19-24, 27-32, 20-16, 32-28, 16-19, 03-07, 
04-08, 28-32, 19-16, 31-26, 05-09, 26-31, 16-20, 07-02, 08-11, 02-06, 09-13, 06-10, 20-16, 32-27, 24-28, 
27-32, 16-19, 31-27, 11-08, 27-24, 19-23, 10-14, 08-11, 24-20, 11-15, 20-24, 23-26, 24-27, 26-30, 27-31, 
15-11, 31-27, 30-25, 27-23, 25-21, 23-26, 13-17, 26-30, 11-15, 30-26, 15-19, 26-31, 17-22, 31-27, 22-25, 

14- 18, 21-17, 27-24, 19-16, 24-27, 25-29, 18-15, 29-25, 15-18, 25-30, 18-23, 17-22, 27-31, 16-12, 31-27, 

22- 26, 23-19, 26-31, 27-24, 30-26, 19-15, 26-22, 24-20, 31-26, 20-24, 22-17, 15-10, 12-16, 10-15, 17-14, 
24-20, 16-12, 15-11, 14-17, 11-15, 17-13, 15-11, 26-31, 11-15, 13-09, 15-10, 12-08, 20-16, 31-26, 10-15, 
08-12, 16-20, 09-06, 15-11, 26-22, 11-15, 06-02, 15-19, 22-26, 19-15, 02-07, 15-18, 07-10, 20-24, 10-06, 

18- 15, 26-31, 15-11, 06-09, 24-20, 09-05, 11-15, 31-26, 15-11, 26-30, 11-15, 05-09, 15-11, 09-14, 11-15, 

30- 25, 15-19, 14-17, 19-23, 17-22, 20-24, 12-08, 23-19, 22-17, 24-20, 17-13, 20-16, 08-12, 16-20, 25-22, 

19- 23, 13-17, 20-24, 17-21, 24-20, 22-25, 23-19, 25-30, 19-23, 21-25, 20-24, 12-16, 24-20, 16-11, 23-19, 
11-08, 19-23, 25-22, 20-16, 08-12, 16-19, 22-17, 19-24, 17-14, 24-20, 14-09, 20-24, 12-08, 24-20, 08-11, 

23- 19, drawn by agreement. 

Listing 7. Kingsrow program vs. WCC, March 3, 2003. WCC was able to draw from the 
losing starting position shown in Figure 5. 

08-11, 12-08, 01-05, 09-13, 10-14, 08-03, 11-16, 13-09, 14-18, 09-14, 18-23, 14-18, 23-27, 18-23, 27-32, 
21-17, 32-28, 17-22, 28-24, 22-26, 24-20, 23-27, 16-19, 26-31, 19-24, 27-32, 20-16, 32-28, 16-19, 03-07, 
04-08, 28-32, 19-16, 31-26, 16-20, 07-10, 08-11, 26-31, 05-09, 10-06, 09-13, 06-10, 20-16, 32-27, 24-28, 
27-32, 16-19, 31-27, 11-08, 27-24, 19-23, 10-14, 08-11, 24-20, 11-15, 20-24, 23-26, 24-27, 26-30, 27-31, 

15- 11, 31-27, 30-25, 27-23, 11-16, 14-18, 13-17, 18-14, 25-21, 23-26, 16-19, 26-31, 17-22, 31-27, 19-15, 
27-31, 21-25, 32-27, 15-19, 27-32, 19-16, 14-10, 16-11, 31-27, 25-21, 10-14, 22-25, 14-18, 21-17, 27-24, 
11-16, 24-27, 25-30, 18-23, 17-22, 27-31, 16-20, 23-19, 20-24, 19-23, 30-25, 32-27, 24-20, 27-32, 20-16, 

31- 27, 25-21, 27-24, 21-17, 24-20, 16-12, 20-24, 17-14, 24-20, 14-10, 20-24, 12-08, 23-19, 10-14, 19-23, 
14-09, 24-20, 08-12, 20-24, 09-13, 24-20, 22-25, 23-19, 13-17, 19-23, 25-30, 20-24, 17-22, 24-19, 22-26, 
23-27, 30-25, 27-23, 26-31, 23-18, 25-30, 18-15, 30-26, 19-24, 26-23, 15-11, 31-26, 11-15, 26-22, 24-27, 
23-18, 15-19, 18-14, 27-23, 14-17, 19-24, 17-13, 24-20, 13-09, 20-24, 12-16, 24-20, 16-11, 23-19, drawn 
by agreement. 

The WYLLIE program searched for 10 to 15 seconds per move, averaging 
about 620,000 moves per second during the search. The time duration was 
chosen for practical purposes. This ending was very long, and if we took a 
combined 30 seconds to type our moves to one another, the game would last 
over an hour if 250 plies were required to win. 

The WYLLIE program was unable to win this ending, conceding the draw 
after 196 plies of play. Even after this lengthy engagement, the PPL database 
indicated that the win was 177 plies away for the WYLLIE program. In this 
respect, only 96 plies of progress were observed after 196 plies of actual 
play. The WYLLIE program got as close as 135 plies from the terminal 
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position before it began to make non-optimal moves. Listing 6 shows the 
moves made by WYLLIE and WCC during this game. 

The KlNGSROW program also played with an average search time of 10- 
15 seconds per move, which was increased to 30 seconds per move (upon 
request of the KlNGSROW programmer) once three Kings were on the board 
for the winning side. The program was able to get an average search depth of 
33 plies during the course of play. The KlNGSROW program got as close as 
159 plies from the terminal position, but it too started to make non-optimal 
moves allowing WCC to push the win further away. At the end of 164 plies 
of play, the win was still 181 plies distant, so KlNGSROW netted 72 plies of 
progress after 164 plies. Listing 7 shows the moves made by KlNGSROW and 
WCC during this game. 

4. Improving Play from the Fourth Position Ending 

There is an arrangement on the checkerboard in which the weak side can 
have one less King than the strong side and still retain a draw. This study 
problem was designated Fourth Position by the checkers fraternity. With one 
slight modification to the arrangement of this position, or by altering the side 
to move, the strong side gains the ability to procure a win which requires 
precise timing of the disposal of one of its pieces. 

Winning the textbook form of Fourth Position requires 81 plies according 
to the PPL database. The original published play from 1756 features a first 
move that would require a total of 85 plies to complete the win, but there 
was also a sub-optimal defensive move in this analysis. The sub-optimal line 
would surrender the game 12 plies more quickly than would the PPL 
database. 

The original solution entailed the well- 
motivated retreat 22-18 (Payne, 1756) to 
start things off, but the PPL database 
offers 22-25 ! winning more quickly. Even 
more incredible is the fact that two moves 
later Black will play 25-29, a move that is 
usually strongly discouraged in almost 
every position in which White has a 
Checker on square 30. The improved PPL 
solution is presented in Listing 8. A 
subset of the classic solution to Fourth 

_ 0 „ , ~ „ ... Position is presented thereafter, with 

Figure 8. Payne s Fourth Position, r 

1756. Black to move wins in 81 plies. commentary correcting the play on the 
White to move draws. defensive side. 






The 7 -Piece Perfect Play Lookup Database for the Game of Checkers 221 
Listing 8. The PPL solution to Fourth Position from Figure 8. 



22-25!, 31-27, 23-19, 32-28, 25-29!, 27-31, 20-24, 28-32, 24-28, 31-27, 29-25, 27-24, 19-16, 24-27, 16- 
20, 27-23, 25-22, 23-27, 22-26, 30x23, 28-24, 32-28, 24x31, 23-19, 20-24, 19-15, 24-19, 15-10, 19-15, 
10-06, 31-26, 28-24, 26-22, 24-28, 22-18, 28-24, 21-25, 24-28, 18-14, 28-32, 25-30, 32-28, 30-26, 28-32, 
14-10, 06-01, 26-23, 01-05, 10-06, 32-28, 23-19, 05-01, 06-10, 01-05, 19-24, 28x19, 15x24, 05-01, 10- 
14, 01-05, 24-27, 05-01, 27-23, 01-05, 23-18, 05-01, 14-09, 01-05, 18-14, 05-01, 09-05, 01-06, 05-01, 06- 
02, 14-18, 02-07, 18-15, 07-02, 15-11, 02-07, 11x02, Black wins. 

The PPL database was able to identify 
some play on the weak side of Fourth 
Position that was not optimal. Figure 9 
shows the position with White to move 
after: 22-18, 31-27, 23-19 which has 
traditionally been followed by the retreat 
of the King 27-31. The perfect play 
database announces that this move leads 
to a win in 69 plies for Black, but the 
optimal line will persist for 12 plies 
longer. The best defense from the position 

shown in Figure 9 is 32-28, 18-22 
(heading back to square 29, as suggested 
by the original improvement, is still the 
fastest course of action from here) 27-31, 22-25, 31-27, 25-29, 27-31, 20-24, 
28-32, 24-28, 31-27, 29-25, 27-24, 19-16, 24-27, 16-20, 27-23, 25-22, 23-27, 
22-26, 30x23,. 28-24, 32-28 and now 24x31 leads a 5-piece position that 
White will lose in 58 plies. 

The purpose of the move 32-28 is to prevent the immediate 19-24 by 
Black. It should be noted that as long as White keeps the King on square 28, 
Black cannot play the strong 19-24 attack, which is instrumental in 
concluding the game more quickly. It is not the absence of 27-31 on move 
two for White that extends the life of the weak side, it is the presence of the 
move 32-28. 

5. Conclusions 

The game of checkers is deceptive in its apparent simplicity. Most strong 
contemporary checkers programs have large opening books capable of 
circumventing early losses, and are likewise capable of handling the tactics 
in the middle game beyond the ability of the strongest human players. But, 
as was demonstrated, the endgame domain is still sufficiently complex so as 
to prevent grandmaster-level programs from winning in positions that are 
known wins with as few as seven pieces on the board. This result was rather 
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Figure 9. White to move after 22-18, 
31-27, 23-19 from Figure 8. 
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surprising and should underscore the complexity inherent in the game of 
checkers. 

The Perfect Play databases of Dodgen and Trice are the only databases in 
existence that allow a software program to play the game of checkers 
perfectly in the endgame. The 7-piece perfect play lookup database allows 
the World Championship Checkers program to announce a win from a 
distance of 253 plies. 

We will continue to build larger PPL databases as time and personal 
computer resources will allow. The web site at 
WorldChampionshipCheckers.com will showcase the PPL database building 
progress and other items of interest to checkers and programming 
enthusiasts. 
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Appendix A: Footnotes to Imperfect Moves Made 

1. 25-30 wins in 201, but 20-24 allows White 4 additional plies. Cumulative slip = 4 plies. 

2. A rare case where the only move to win is a reversal of the previous move. This indicates that the 

GTV database must be consulted at every node in the tree during the search, a very expensive 
computation. Usually when one side is ahead by once piece, only the root of the tree needs to consult 
the database and prune the moves leading to draws and losses. The reasoning behind this is that in 
many cases, just about every move wins, so probing the database does not prune any legal moves, 
but it does slow down the search a great deal, even if the entire database is RAM-resident. 

3. This retreat again is correct, but shuffling back and forth is usually penalized by an evaluation 
function. Notice the position is changing ever-so-slightly as the weak side has not shuffled back and 
forth over the same moves as the strong side. This is a very difficult position to play properly! 

4. It should be noted that 24-20, again a repeated move on the strong side, would also lead to an 
optimal win in 189. 

5. 19-16 wins in 187, but 19-23 allows White 4 additional plies. Cumulative slip = 8 plies. 

6. Another instance where a reversing of the previous move is the only move to win. In the principal 
variation, the program expects 15-18, a non-optimal defensive move, instead of 15-11, the PPL best 
defense. 

7. At ply 31, the program chooses this over its previous best candidate, 16-12, which was the optimal 
move. 16-12 wins in 185, but 16-20 allows White 2 additional plies. Cumulative slip = 10 plies. 

8. 20-16 wins in 187, but 17-21 allows White 4 additional plies. Cumulative slip = 14 plies. 

9. 16-12 wins in 185, but 16-19 allows White 2 additional plies. Cumulative slip = 16 plies. 

10. 19-16 wins in 187, but 19-23 allows White 4 additional plies. Cumulative slip = 20 plies. 

11. 19-16 wins in 187, but 19-24 allows White 4 additional plies. Cumulative slip = 24 plies. The 
program searched 63 plies and moved instantly for the strong side here, reporting a draw. This is 
because the hash table was saturated with positions consisting of only one move to win. The 
program will not make a drawing or losing move, but all of the moves maintaining the win have 
already been tried. On the strong side, the program tries not to repeat moves, but the weak side has 
created a position that will cycle in the hash table. This is the beginning of some serious trouble. 

12. 24-20 wins in 189, but 25-30 allows White 6 additional plies. Cumulative slip = 30 plies. 

13. 30-25 wins in 195, but 24-27 allows White 2 additional plies. Cumulative slip = 32 plies. 

14. 30-25 wins in 195, but 24-27 allows White 2 additional plies. Cumulative slip = 34 plies. 

15. 30-25 wins in 195, but 24-27 allows White 2 additional plies. Cumulative slip = 36 plies. 

16. 30-25 wins in 195, but 24-27 allows White 2 additional plies. Cumulative slip = 38 plies. At this 
point, the program has slipped into a position that will repeat and allow the weak side to draw. 

17. 23-19 wins in 191, but 23-18 allows White 4 additional plies. Cumulative slip = 4 plies. 

18. 17-14 wins in 179, but 31-27 allows White 4 additional plies. Cumulative slip = 8 plies. 

19. 22-25 wins in 181, but 17-14 allows White 4 additional plies. Cumulative slip = 12 plies. 

20. 14-17 wins in 183, but 22-17 allows White 4 additional plies. Cumulative slip = 16 plies. 

21. 20-16 wins in 183, but 13-09 allows White 8 additional plies. Cumulative slip = 24 plies. 

22. 09-13 wins in 185, but 09-06 allows White 4 additional plies. Cumulative slip = 28 plies. 

23. 06-09 wins in 187, but 22-17 allows White 4 additional plies. Cumulative slip = 32 plies. 
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24. The program makes the correct move after searching 63 plies then moving instantly, reporting a 
score of draw for the strong side. This is the same phenomenon that was observed in the other 
position, and the repetition spiral is about to begin. 

25. 21-25 wins in 181, but 17-14 allows White 4 additional plies. Cumulative slip = 36 plies. 

26. 21-25 wins in 183, but 14-09 allows White 8 additional plies. Cumulative slip = 44 plies. 

27. Another instance of hitting the limit of 63 plies of search due to hash table saturation. This is the 
correct move to win optimally but the program is reporting a draw from the repetitions. 

28. Although this is the correct move for the optimal win, the program searched for over four times as 
long to reach ply 31 on this move than for the average of the previous moves. 

29. 13-17 wins in 181, but 13-09 allows White 4 additional plies. Cumulative slip = 48 plies. 

30. 17-22 wins in 175, but 19-24 allows White 4 additional plies. Cumulative slip = 52 plies. 

31. 17-21 wins in 177, but 17-14 allows White 4 additional plies. Cumulative slip = 56 plies. 

32. 14-17 wins in 179, but 19-23 allows White 4 additional plies. Cumulative slip = 60 plies. 

33. The classic problem of "wandering Kings" is present here. Now 45 moves into the game, with 31 
moves of regression, the program has only advanced 14 full moves towards the goal. This was, by 
far, the longest time required to complete a 31 ply, search at 27 minutes 16 seconds. 

34. Making the correct move, and spending only 47 seconds to complete 31 plies of search in this 
instance. 

35. 1 8-22 wins in 171, but 24-27 allows White 4 additional plies. Cumulative slip = 64 plies. 

36. 22-25 wins in 173, but 22-17 allows White 4 additional plies. Cumulative slip = 68 plies. 

37. 1 8-22 wins in 175, but 18-14 allows White 4 additional plies. Cumulative slip = 72 plies. 

38. 14-18 wins in 177, but 19-23 allows White 4 additional plies. Cumulative slip = 76 plies. 

39. 10-14 wins in 177, but 23-19 allows White 4 additional plies. Cumulative slip = 80 plies. 

40. Another long search, indicative of opportunities to give up the draw appearing in the anticipated line 
of play. 10-14 wins in 179, but 08-11 allows white 8 additional plies. Cumulative slip = 88 plies. 

41. 10-06 wins in 185, but 22-17 allows White 4 additional plies. Cumulative slip = 92 plies. 

42. Making the correct move after a research on ply 31, replacing the “wandering” 06-02 move. 

43. 14-17 wins in 175, but 14-10 allows White 4 additional plies. Cumulative slip = 96 plies. 

44. 10-14 wins in 177, but 12-08 allows White 4 additional plies. Cumulative slip = 100 plies. 

45. 10-14 wins in 179, but 08-1 1 allows White 8 additional plies. Cumulative slip = 108 plies. 

46. 10-06 wins in 185, but 22-17 allows White 4 additional plies. Cumulative slip = 112 plies. 

47. 10-06 wins in 185, but 10-07 allows White 4 additional plies. Cumulative slip = 116 plies. 

48. 14-17 wins in 175, but 23-27 allows White 4 additional plies. Cumulative slip = 120 plies. 

49. 22-25 wins in 173, but 12-08 allows White 8 additional plies. Cumulative slip = 128 plies. 

50. 08-12 wins in 177, but 08-1 1 allows White 4 additional plies. Cumulative slip = 132 plies. 

51. 14-17 wins in 179, but 22-17 allows White 4 additional plies. Cumulative slip = 136 plies. 

52. 08-12 wins in 175, but 17-14 allows White 4 additional plies. Cumulative slip = 140 plies. 

53. 14-17 wins in 175, but 14-10 allows White 4 additional plies. Cumulative slip = 144 plies. 

54. 10-14 wins in 177, but 12-08 allows White 4 additional plies. Cumulative slip = 148 plies. 

55. 10-14 wins in 179, but 08-1 1 allows White 8 additional plies. Cumulative slip = 156 plies. 

56. 10-06 wins in 185, but 22-17 allows White 4 additional plies. Cumulative slip =160 plies. 

57. As was seen in the position at [42], here too the correct move was made after a research on ply 31, 
replacing the "wandering" 06-02 move. 

58. Another correct move, played here at move 94, leads to the same position at move 60, which was 68 
plies ago. 

59. 14-17 wins in 175, but 14-10 allows White 4 additional plies. Cumulative slip = 164 plies. The 
position here at move 97 is the same as was seen at move 63, playing in the cycle from 68 plies ago. 
See note [43]. 

60. 10-14 wins in 177, but 12-08 allows White 4 additional plies. Cumulative slip = 168 plies. The 
position here at move 98 is the same as was seen at move 64, playing in the cycle from 68 plies ago. 
See note [44]. 

61. 10-14 wins in 179, but 08-11 allows White 8 additional plies. Cumulative slip = 176 plies. The 
position here at move 99 is the same as was seen at move 65, playing in the cycle from 68 plies ago. 
See note [45]. 

62. 10-06 wins in 185, but 22-17 allows White 4 additional plies. Cumulative slip =180 plies. The 
position here at move 100 is the same as was seen at move 66, playing in the cycle from 68 plies 
ago. See note [46]. 

63. 10-06 wins in 185, but 10-07 allows White 4 additional plies. Cumulative slip =184 plies. The 
position here at move 102 is the same as was seen at move 68, playing in the cycle from 68 plies 
ago. See note [47]. 
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Abstract This paper describes the design and development of two world-class Lines of Ac- 
tion game-playing programs: YL, a three time Computer Olympiad gold-medal 
winner, and Mona, which has dominated international e-mail correspondence 
play. The underlying design philosophy of the two programs is very different: 
the former emphasizes fast and efficient search, whereas the latter focuses on a 
sophisticated but relatively slow evaluation of each board position. In addition to 
providing a technical description of each program, we explore some long-standing 
questions on the trade-offs between search and knowledge. These experimen- 
tal results confirm the conclusions made by earlier researchers in the domain of 
chess, thus showing that the trends are not game-specific. In particular, we see 
diminishing returns with additional search depth, and observe that the knowledge 
level of a program has a significant impact on the results of such experiments. 

Keywords: Lines of Action, search, knowledge 

1. Introduction 

One of the most important considerations when designing a strategic game- 
playing program is the trade-off between knowledge and search. 

To decide on the best move continuation, programs typically perform a look- 
ahead search, evaluate the positions at the leaves of the search tree, and then 
propagate those values back to the root using the minimax principle. A pro- 
gram that uses a sophisticated but time-consuming board evaluation can more 
accurately determine the merit of each game-state visited, at the cost of sacri- 
ficing some of the look-ahead depth. Conversely, a program that uses a faster 
but less sophisticated board evaluation method can perform a deeper search, 
improving its short-term tactical ability. There is also compensation toward 
better knowledge, in that each additional level of search provides a more re- 
fined approximation of the value of each preceding position. 

The trade-off between knowledge vs. search has spurred a considerable 
amount of research interest in the past, mainly for the game of chess (Schaeffer, 
1986; Berliner et al., 1990; Junghanns and Schaeffer, 1997; Heinz, 2000). This 
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paper provides further insights using the game of Lines of Action (LoA for short) 
as a new test-bed. LoA is tactically and strategically complex, and programs 
can employ many of the advanced search techniques and enhancements used 
by successful chess programs. 

Two of the world’s strongest LoA programs were developed hand-in-hand 
at the University of Alberta (Billings, 2000). One program, YL, developed by 
Yngvi Bjornsson, uses a very fast but somewhat restricted framework for board 
evaluation, allowing deep look-ahead. The other program, Mon A, developed 
by Darse Billings, employs a relatively slow evaluation function resulting in 
shallower search, but with added features that provide a better assessment of 
each position encountered. This fundamental difference in design philosophy 
provides an opportunity to investigate the relative importance of knowledge 
versus search. 

The main contributions of this paper are: (a) descriptions of the design and 
cooperative development process of two high-performance game-playing pro- 
grams, and (b) experimental investigation of some general trade-offs between 
search and knowledge, using a domain that is different from chess, but belongs 
to the same class of games. 

The next section briefly presents the rules of LoA and summarizes important 
strategic game concepts to be considered by programs. Section 3 describes 
some of the many benefits of the co-development process. Sections 4 and 5 
provide detailed technical descriptions of YL and Mona, respectively. Section 
6 provides empirical results and some knowledge versus search experiments 
using the two programs. Finally, Section 7 summarizes the content and states 
conclusions. 

2. Lines of Action 

Lines of Action was invented by Claude Soucie in the early 1960s, and 
was popularized by Sid Sackson in his book “A Gamut of Games” (Sackson, 
1969). The simple, elegant rules are now presented, along with an overview 
of some of the important strategic concepts that a high-performance program 
might consider incorporating. 

2.1 Rules 

Objective. The object of the game is to move all of your pieces into a single 
connected group. Pieces may be connected diagonally or orthogonally. The 
leftmost diagram in Figure 1 shows the initial board layout. 

Movement. Black moves first, and players alternate, moving one piece per 
turn. A piece may move horizontally, vertically, or diagonally. Along a given 
line, the distance a piece moves is the same as the total number of pieces (of 
both colours) on that line. You may jump over your own pieces, but not your 
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Figure 1. Initial LoA board layout, blockades, and mate-threat examples. 

opponent’s pieces. You may land on and capture your opponent’s pieces, which 
are then removed from the game. You may not land on your own pieces. 

2.2 Strategic Concepts 

In chess and checkers, having more pieces than the opponent is highly corre- 
lated with winning, and this property outweighs all other factors in importance. 
In contrast, there is no single dominant feature in the assessment of LoA po- 
sitions. As such, it is quite common to make concessions in one positional 
factor in order to strengthen another, and strong programs are frequently able to 
manage these trade-offs in order to maximize several features simultaneously. 

We now list some of the important principles of LoA encountered during the 
development of Mona and YL. 

Material. In LoA, there is no clear consensus on whether having extra 
material is advantageous, neutral, or detrimental. Since the goal of the game 
is to connect all of ones pieces into a single group, having fewer pieces can 
require less work to fully coordinate them. In contrast, having more pieces 
might make it easier to form one large group, and might also enable better 
control of the board, preventing the opponent from connecting their pieces. 
It may be the case that having more pieces than the opponent only offers an 
indirect advantage, by increasing the value of other properties mentioned in this 
section. Those indirect advantages may exceed the added liability of managing 
the extra pieces, yielding a net positive effect for material advantage. However, 
since those other attributes are being measured separately, the weight assigned 
to material difference may be zero, or negative. 

Mobility. As with many other board games, it is normally advantageous 
to have a position with many options and possible continuations. Increased 
mobility generally entails increased flexibility. Simply having many moves 
available can make it easier for a player to develop their own plans, interfere 
with the opponent’s plans, and defend against an opponent’s immediate threats. 
Moreover, several types of moves can be identified, such as: moves that capture 
a piece, moves toward or away from the center of the board, moves that connect 
our pieces, or moves that cut an opponent group. Each of the distinct move types 
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can then be evaluated differently. Having the move (i.e., being the next player 
to move in a given position) can also be treated as a distinct characteristic of 
the position. The value of this privilege depends on other positional properties, 
and generally increases toward the end of the game. 

Centrality . In LoA, controlling the center of the board is very important. 

This can be accomplished by direct occupation of the more central squares, or 
by tactical counter-measures that prevent the opponent from occupying those 
squares. The center is particularly important in view of the standard starting 
position. Since each side must unite pieces from opposite sides of the board, 
seizing control of the center gains the shortest route to unification, while simul- 
taneously interfering with the opponent’s connection. Having a bias toward 
centrality also has added pragmatic value, giving the program a sound high- 
level “plan” of bringing its pieces together in the middle of the board. 

Piece Coordination. There are several identifiable concepts under the broad 
heading of piece coordination. Each of these is normally with respect to the 
pieces of the same colour. First, we say two pieces are connected if they 
are orthogonally or diagonally adjacent to each other. A group is a strongly 
connected subset of pieces (the object of the game being to form a single group). 
A program may designate a main group to be the largest group, or perhaps the 
most central group. The concept of connectivity can be measured as the number 
of pairwise connections between pieces; or the total number of groups; or the 
number of pieces that are not connected to the main group. The proximity 
or cohesion is a measure of distance or scatteredness of a player’s pieces. An 
outlier is an isolated piece (typically at the edge of the board) that needs to be 
brought into connection with or proximity of the main group, or the majority 
of like-coloured pieces. 

Obstructions. An opponent’s piece or group of pieces may constitute an 
obstruction to connection. A piece may block one direction of movement of 
an enemy piece. A strong defensive formation is a blockade albng the second 
rank or file, which greatly restricts the mobility of enemy pieces along the edge, 
and disconnects them from other like-coloured pieces. The effectiveness of an 
enemy blockade can be greatly reduced by having a foothold, which is a piece 
on the second rank that extends the edge group toward the center, and creates 
a defect in the blockade wall. The center diagram in Figure 1 shows a strong 
blockade for White along the top edge, and a White foothold (labeled ’1’) in 
Black’s blockade on the left. 

Mate Threats. A mate threat is a threat to win the game on the next move, 
by connecting all pieces into one group. In general, mate threats are devastating 
in LoA, since the opponent typically must weaken their position considerably to 
answer the threat. Given the rather highly constrained movement options, this 
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will commonly lead to subsequent mate threats, until finally there is no adequate 
response. The rightmost diagram in Figure 1 shows an example position where 
a long sequence of mate threats secures Black a win. Since they frequently lead 
to forced winning sequences, it can be worthwhile to detect statically certain 
types of mate threats, and give a large evaluation bonus for each one present. 
This property of LoA also encourages special-purpose algorithms near the end 
of the game, such as threat-based search ; or the proof-number techniques seen 
in Sakuta et al. (2002) and Winands, Uiterwijk, and Van den Herik (2002). 

3. Co-development, Co-evolution 

The development of both YL and Mona began in February of 2000. Since 
YL was expected to be clearly superior in terms of engineering and search speed, 
the author of Mona decided early on to focus on having superior knowledge 
in the form of a better-informed evaluation function. This turned out to be a 
fortuitous decision, as the contrasting styles of play enabled both sides to learn 
far more from friendly contests than would have otherwise been possible, and 
the progress of both programs was greatly accelerated as a result. 

YL is based on a very fast framework, and its evaluation function is fully 
incremental, meaning that it has very little work to do at each leaf node. In con- 
trast, Mona has a work-intensive evaluation function applied to each leaf node. 
Overall, YL is about 22 times faster than Mona in terms of positions processed 
per second. In compensation, Mona evaluates each position somewhat more 
thoroughly. 

To build a high-performance search engine like YL, the philosophy is: “start 
fast and stay fast”, meaning that speed considerations and optimizations must 
be made at every stage of development. However, it can become increasingly 
difficult to make significant changes to the highly constrained architecture. In 
contrast, the design of Mona is basic and flexible, so new features can be 
added without difficulty. Since the evaluation function is already very costly, 
new attributes can be added that are rather expensive to compute, with only a 
minimal impact on overall speed performance. One might say “if you’re slow 
anyway, take advantage of it!”. 

These fundamental differences in approach significantly enhanced the co- 
evolution of YL and Mona. A much broader range of positions were explored 
than would been seen with self-play matches, and critical weaknesses in each 
program were quickly revealed when playing into the other program’s strength. 
The advantages of cooperative development do not end there. Since evaluation 
features are easy to add and experiment with in Mona, the slower program 
could be used as a proving ground for new ideas. If certain properties prove 
to be extremely valuable, they could then warrant the more difficult changes in 
YL. One example of this cross-fertilization occurred with “footholds” (a piece 
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on the second rank that diminishes the effect of an opponent blockade, as shown 
in Figure 1 , and discussed in Section 5). This feature turned out to be so valuable 
in practice that a special effort was made to detect similar patterns within the 
framework of YL’s fast evaluation function. The co-evolution worked in both 
directions. For example, games that Mona lost to YL due to short-term tactical 
errors suggested game-specific knowledge that could be added to reduce the 
risk associated with that weakness. As a result, some of Mona’s knowledge 
is designed to compensate directly for the shallower search. 

The development of Mona and YL was greatly facilitated by two e-mail 
games played against Kerry Handscomb (one of the strongest LoA players in 
the world, having tied for first place in the 2000 e-mail championship). Early 
versions of Mona and YL combined efforts against him, choosing the move 
with highest average score. The lessons learned from those games, and Kerry’s 
commentary, resulted in major improvements to the evaluation functions of 
both programs. Kerry also wrote a series of informative articles on LoA for 
the magazine Abstract Games (Handscomb, 2003) (see issues 1-3, and others). 
Another valuable source of LoA domain knowledge was Dave Dyer’s (2003) 
excellent website for the game, which includes the game records of past e-mail 
championships. For other resources, see the Mona and YL webpage (Billings, 
2000 ). 

4. YL 

This section describes the architecture of YL, including the underlying 
framework, the board-evaluation scheme, and the search algorithm. 

4.1 Line Decomposition 

The program evaluates board positions line by line — that is, each file, rank, 
and diagonal is evaluated independently. The score of a board position is the 
sum of the scores of its lines. The board is decomposed into 32 lines as shown 
in Figure 2. The first diagram pictures the 8 files, the second diagram the 
8 ranks. The two remaining diagrams show the diagonals, which are paired 
to form 8-square-long diagonals, hereafter referred to as extended diagonals . 
This pairing is done to achieve a more compact representation. An extended 
diagonal is still considered as two distinct diagonals for evaluation purposes. 

Evaluating the lines independently makes it possible to use a fast table look- 
up evaluation scheme to score the board during game play. For each line there 
are only 3 8 = 6561 possible different piece configurations. The total number 
of configurations (32 x 6561) is thus small enough to be evaluated beforehand. 
This evaluation is done at program startup and stored in tables residing in 
memory, called evaluation tables. The game is divided into three game phases 
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Figure 2. Example lines: file, rank, and extended diagonals. 

(beginning, middle, and endgame) and different evaluation tables are used for 
each phase. 

The program represents a board position internally using integers, one for 
each line. Each integer takes a value in the range 0 to 6560, representing the 
current piece configuration for the line. Piece configurations are mapped into 
integers as: 

<->8 x 3 7 -f- S'j x 3 6 T* ... S 2 x 3^ -f- x 3^ 

where Si identifies the occupant of the i-th square on the line ( empty = 0, 
black = 1, and white = 2). These piece-configuration numbers are updated 
incrementally as moves are made on the board. Removing 1 or adding a piece 
from/to a square affects only the configuration of the four lines intersecting the 
affected square. One benefit of this representation is that the piece-configuration 
numbers can be used directly as indicies into the evaluation tables. Evaluating 
a board position is then simply a matter of looking up the merit of each line 
in the evaluation table. This evaluation is also done incrementally by keeping 
track of the current board score, and adjusting it by the evaluation differences 
of only the line configurations that changed during a move. This requires only 
a few table lookups. This efficient way of representing and evaluating boards 
is not new. Similar board representations are used by some high-performance 
Othello programs (Buro, 1999). 

4.2 Evaluation Function 

The main attraction of the aforementioned line-by-line evaluation scheme is 
its efficiency. On the down-side, the type of features that can be expressed within 
this framework are necessarily somewhat restricted. However, several impor- 
tant features can be measured precisely (including material balance, number 
of connections, and piece mobility), whereas other features must be approx- 
imated (such as proximity, obstruction, and blockage effectiveness). Where 
such approximations are not sufficient we use special non-line-based patterns. 

Material. YL has a slight dislike for being up material, increasing as the game 
progresses. Note that this does not necessarily imply that it is bad to capture 
pieces. Rather, YL captures pieces only if it gives positional advantages. 
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Figure 3. Example of a blockage penalty, and proximity bounding boxes. 

Mobility. The program distinguishes between four types of moves (in de- 
creasing order of importance): capture moves, moves that establish a connec- 
tion, regular moves, and moves that disconnect own piece formation. Note that 
a single move can belong to more than one category. The merit of such a move 
is the combined merit from all relevant categories. Also, the side to move gets 
a constant bonus. 

Centrality . A bonus is given for both direct and indirect center control; the 

more centralized a piece is the higher bonus it gets. Pieces sitting on the edge 
of the board are penalized, although somewhat less if they can move towards 
the center. 

Piece coordination. YL measures connectivity by summing the number of 
neighbours of the same colour over all pieces on the board. It also measures 
line- wise piece proximity (how far apart pieces lie on a line). However, ex- 
perience showed this measure to be insufficient on its own. Therefore, before 
the Computer Olympiad in 2002, a new non-line-based proximity feature was 
added. It keeps track of the area (number of squares) of the minimal bounding 
box needed to enclose pieces of each side; the smaller the box, the higher bonus 
a side gets. These boxes also encourage outliers to start gravitating toward the 
rest of the pieces. In Figure 3, the diagram on the right shows the bounding 
boxes for both sides. 

The program has no notion of how many groups there are on the board 
except, of course, detecting the “single group” end-of-game condition. A single 
remaining group is detected by doing a breadth-first search over all neighbours 
of the same colour, starting with the piece last moved. If the number of pieces 
visited is equal to the number of pieces a player has, we know the pieces form 
a single group (in case of a capture move we need also to check end-of-game 
condition for the opposing side). Detecting the single-group condition initially 
slowed the program down significantly. However, by keeping two bitmasks for 
each side indicating which columns and rows its pieces occupy, we can cheaply 
check for necessary conditions of a single group being formed, in which case 
we then do the more expensive connection test. In practice, this trick eliminates 
most calls to the breadth-first search. These bitmasks are also used to efficiently 
keep track of the proximity bounding boxes. 
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Obstructions. YL gives additional penalty to edge pieces that are fully or 
partially blocked by the opponent’s pieces. For example, in the diagram on 
the left in Figure 3 both white pieces are penalized. The number of penalty 
points is determined by the length of the blocked lines: piece 1 gets in total 1 1 
penalty points (shown with an ’ 1’) whereas piece 2 is only penalized by only 1 
point (shown with a ’2’). However, this scheme overestimates the penalty for 
blockades with footholds (the blockade is less effective because the edge pieces 
are connected to the outside via the foothold piece). A line-by-line evaluation 
scheme is unable to detect such situations. Footholds do occur frequently 
enough in practice to warrant a special treatment. Thus a special configuration 
pattern is used in YL that looks at the second file/rank in conjunction with the 
first, allowing the program to detect whether blocked edge pieces are connected 
to the outside and then scale down the blockade penalty appropriately. 

YL also detects (along lines) how many opposite coloured pieces there are 
in between one own pieces, slightly penalizing such obstructions. 

4.3 Evaluation Weights 

Each line is evaluated using the aforementioned features that are then com- 
bined into a single line value using a linear function. A linear function was 
chosen somewhat arbitrary. In practice we could equally well have used a more 
complex non-linear function without sacrificing performance. However, we 
have not experimented with such alternatives. 

Instead of hand-tuning the evaluation weights, we initially used a temporal- 
difference learning method for determining the relative importance of each eval- 
uation feature. This was a sensible decision given the limited expert knowledge 
about the game, and allowed us to obtain a initial set of weights superior to what 
we could come up with by hand. A version of the program using the learned 
weights won the gold-medal at the Computer Olympiad held in London in 2000. 
However, further tuning of the weights (mainly based on observations from tour- 
nament play) and, in particular, introduction of new evaluation features have 
since then significantly increased the program’s playing strength. 

4.4 Search 

YL uses a traditional alpha-beta-based search algorithm (more specifically 
Principal Variation Search (Marsland, 1982)). The algorithm is augmented 
with many state-of-the-art enhancements, such as: iterative deepening, aspi- 
ration windows, a two-level transposition table, extensive automatically built 
opening book, repetition detection, and thinking on opponent’s time. The pro- 
gram also employs two well-documented speculative pruning schemes: null- 
move (Beal, 1989) and multi-cut pruning (Bjomsson and Marsland, 2001). 
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As mentioned earlier, evaluation of game positions is done incrementally 
using table lookups. Move generation is also relatively fast, in part because 
legal moves for all possible line configurations are pre-calculated at program 
startup. The program employs both static (based on evaluation-table values) and 
dynamic (transposition-table move, killer-moves and history-heuristic) move- 
ordering techniques. A hierarchical move-ordering approach is used: in the 
upper levels of the tree (closer to the root) a more sophisticated move order- 
ing is employed whereas at the lower levels a faster, although somewhat less 
sophisticated, ordering mechanism is used. All together, this results in a very 
fast and efficient search (the program typically explores close to 1.5 million 
positions-per-second in the opening on a 2.4GHz P4 PC). 

5. Mona 

This section describes the search engine and the evaluation function of 
Mona. As stated before, emphasis is on evaluation. 

5.1 Search Engine Components 

Mona is a fairly basic alpha-beta search program, using Principle Varia- 
tion Search. The data structures for board representation, evaluation features, 
and move lists are simple integer arrays. Most of the well-established search 
enhancements are used, such as iterative deepening, transposition tables (with 
embellished Zobrist hashing), and the null-move heuristic. 

Since move generation is used in the leaf-level evaluation function as well as 
the search process, the program spends a significant fraction of its execution time 
in this procedure. An optimization called move gathering was implemented, 
where all possible moves for each line configuration (row, column, or diagonal) 
are pre-computed, and those short move lists are concatenated at runtime. This 
resulted in a 40% speed-up to the program. A further 200-300% speed-up might 
be possible by incrementally carrying the move list indices (the new bottleneck), 
but this was not done prior to the program’s retirement in 2001. 

Good move ordering is accomplished with a two-level hierarchy. First, the 
transposition table move is considered (i.e., the move which produced the best 
score the previous time the position was encountered, typically in the previous 
iteration). The second level is the default static move ordering , which ranks 
the general desirability of each move. A move toward the center of the board is 
rated higher (specifically, the centrality of the destination square), and capture 
moves are given a small bonus. Since this ranking is over a fixed interval (2- 
16), a linear time Radix sort is used to order the move list. This default move 
ranking is built into the move generator to reduce overhead. Move ordering 
with killer moves and the history heuristic are available as an option, but do not 
significantly increase program performance. 




Search and Knowledge in Lines of Action 



241 



Since a prominent odd- even effect is observed in evaluations, and since mate 
threat detection (described below) only occurs on odd-ply searches, Mona 
iterates 2-ply at a time. The courser granularity of iterative deepening results in 
an inefficient use of time when using a fixed-time-per-move time control, but 
this is partially offset by the savings from not doing even-numbered iterations. 

5.2 Evaluation Function Components 

The strength of Mona lies in the evaluation function, which attempts to as- 
sess several properties of strategic importance, in the hope that this information 
will more than compensate for the overhead added by the relatively expensive 
computations. 

The most basic evaluation simply determines whether the position is won, 
lost, drawn (by simultaneous connection), or unknown. This is done with a low- 
overhead breadth-first search to identify each group. If there is only one group of 
a given colour, then the game is over. The most important components of the full 
evaluation function are centrality , mobility , thickness , and mate threats. Useful 
refinements consider outlier mobility , blockades , footholds , outlier blocking , 
the progress toward connectivity, and the value of the move. 

In most cases, it is the net difference between Black pieces and White pieces 
that is of interest. For example, the program would willingly reduce its own 
mobility provided that the opponent’s mobility is reduced by an even greater 
amount. For the most part the evaluation is symmetric (with the exception of 
outlier mobility, and mate threats). 

Centrality. From a practical programming point of view, centrality is more 
than just a feature - it can be thought of as an overall game plan. The other two 
most important features in Mona’s evaluation function (mobility and thick- 
ness) also have a significant centrality bias. This lends a degree of “harmony” 
to the evaluation, in that they are all striving for mutually supportive goals, 
rather than being at odds with each other. And indeed, it is quite common to 
sacrifice some of one commodity in order to gain more of another, in a cyclic 
process that eventually reaches positions that are powerful on all three counts. 

To quantify centrality, each square is assigned a weight corresponding to the 
sum of orthogonal distances from the nearest comer (the four comer squares 
having a weight of 2, up to the four center squares having a weight of 8). The 
net centrality is relative to the number of pieces remaining. Thus, a few pieces 
on squares near the center would have a higher average centrality than a large 
number of pieces scattered about the board. 

Mobility. A basic measure of mobility would simply count the number of 
moves that can be made. However, some moves are generally better than others. 
As noted above, the static move ordering value is determined by the destination 
centrality, and whether the move captures an enemy piece. By summing over 
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those values, the mobility function is naturally biased toward the moves that 
are likely to be useful . This is an absolute measure, so having more pieces, and 
thus more moves, is generally favourable. 

An interesting consequence results from giving capture moves a greater 
weight. This actually discourages even exchanges, because having the op- 
tion to make a capture has more value than actually making that capture. Thus, 
all else being equal, the program favours building up the pressure on key squares 
rather than releasing the tension through an even exchange. This has consider- 
able practical value, especially in play against humans, because the computer 
program can handle the extra burden of many complex continuations much 
better than a human player. The chance of the opponent making a fatal error 
is thus increased in practice. This principle is highly analogous to the famous 
chess adage “the threat is stronger than the execution”. 

Thickness. In general, “clumping” of pieces is desirable, and there is some 
value in having redundant connections, preventing a group from being cut into 
two. However, we want to avoid having groups that are “too heavy”, thereby 
reducing its own mobility. 

The measure of connectivity used by Mona is called center-thickness , 
or simply thickness. A straight-forward measure would count the number of 
pairwise adjacent pieces on the board. The embellishment used is to weight 
each of those pairs according to the centrality of the squares they occupy. This 
is another cumulative measure, so having more pieces is generally favourable. 
Mona uses a zero weight for the material factor, but still exhibits a preference 
for extra pieces, based on mobility and thickness. 

It should be clear that the centrality biases built into mobility and thickness 
will generally encourage pieces to be moved away from the edge; and for groups 
to be formed in the center, if possible. However, this is not a heavy-handed 
bias, and cannot be easily obtained against quality opposition. It is simply a 
preference over other types of moves and piece formations. The relative weights 
of these three evaluation terms were set to have roughly equal contributions, 
with a slight preference for mobility, since it usually has a bit more pragmatic 
value. Very little was done in the way of tuning, and it is unknown if the 
program’s performance could be enhanced significantly with more thorough 
experimentation. 

Mate Threats. Mona computes many useful properties for each position. 
All groups are identified with the breadth-first search described previously. 
Mona designates the largest group to be the main group (choosing arbitrarily 
among equals). 

Since the full move list is also available, it is possible to selectively do a check 
for the case where a single remaining outlier has a move that will put it adjacent 
to the main group, which constitutes an immediate threat to win the game. 
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This particular type of mate threat is the most common in practice, and can be 
detected without a full extra ply of look-ahead. While it is possible to write a 
special-purpose mate solver that looks only at direct threats and responses, we 
found that most of that utility was accomplished by simply knowing that a threat 
exists. Mona assigns a huge bonus for each such threat, which dominates all 
other evaluation terms. Thus it will always choose the best move among those 
that contain a threat (or highest number of multiple threats). 

This policy is something of a gamble, since the threat may not actually lead 
to a win. However, to date we have only seen two cases where a winning 
position was lost by chasing a specious mate threat (both were against YL and, 
unfortunately, one in an important game). 

Outlier Mobility. Given the rich information maintained about the board 
position, Mona can determine the individual mobility for every piece that is 
not part of the main group. She applies a non-linear penalty to the least mobile 
outlier for each side. Only the single worst outlier is considered. The search will 
naturally uncover combinations of moves that improve more than one outlier. 

The weights for this feature are heavily skewed, making our worst outlier 
more important than the opponent’s worst outlier. The reasoning is that we do 
not want to invest a lot of energy (and evaluation points) on trying to trap an 
opponent piece that might easily escape, leaving us in a weakened position. 
Conversely, it is also risky to allow our own outliers to be trapped, since there 
may be no effective way to solve the problem (especially against humans, who 
can easily visualize such futures). The program is careful to avoid losing a 
game due to such traps, while preferring to build up steadily a strong position 
rather than trying to trap the opponent. This is one of several examples where 
the evaluation is consistent with the natural strengths and weaknesses of the 
program. 

Blockades and Footholds. Mona’s evaluation function expends a lot of 
effort on the analysis of blockades along the second rank (see center Figure 1). 
These formations arise naturally from the standard starting position, even when 
using only the basic evaluation knowledge. The special purpose evaluation 
assesses the effectiveness of each blockade. Each additional blocking piece has 
a multiplicative effect on the penalty, while the presence of a foothold nullifies 
it almost completely. The program also distinguishes between a blockade of 
the main group and a blockade of outliers, with the latter being more serious. 

Other Features. Since the character of LoA positions change radically dur- 
ing the course of the game, it is desirable to alter the overall plan and assessment 
to match the prevailing conditions. As a case in point, even the most refined 
positional evaluation is of little use in the final stage of the game - the only 
relevant question is whether we can form a single connected group before the 
opponent does. Empirically, it was found that the value of having the move 
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increases steadily toward the end of the game, when having the initiative is 
usually decisive. 

The progress toward game completion is actually measured in a variety of 
ways. One of the simplest is to count the number of remaining pieces that are 
not part of the main group. A bonus is given for having two remaining outliers, 
and a larger bonus for having only one (since mate threats are commonly on 
the horizon). However, it is dangerous to try to converge too quickly, as it may 
fail tactically after all of the positional advantage has been sacrificed. 

Many of the strategic properties perceived by Mona’s evaluation function 
cannot possibly be uncovered within a practical search horizon. For example, a 
piece trapped behind a wall of enemy pieces can be identified by static analysis, 
but the consequences of having that piece trapped might not begin to be felt 
until many moves in the future. Mona can add this type of knowledge without 
much down-side, whereas it is difficult to define within the framework of YL, 
and might be prohibitively expensive in any case (resulting in a net decrease in 
performance). 

6. Empirical Results and Experiments 

In this section we first provide insights into the playing strength of YL and 
Mona by reviewing tournament results. Secondly, we investigate the trade- 
offs of knowledge versus search in the game of LoA both via self-play and by 
matching the two programs against each other. 

6.1 Over-The-Board Competitions 

After a series of mutually beneficial friendly matches against each other, YL 
and Mona made their competitive debut in the University of Alberta Lines of 
Action Open, in April 2000 (Billings, 2002). The tournament was a double 
round-robin format with a time control of 20 seconds per move. YL won with 
a perfect 22-0 score, and Mona finished second at 19-3. 

In August 2000, both programs competed in the Fifth Computer Olympiad 
in London, England. YL won the gold medal, and Mona won the silver. 
Although Mona lost a critical game to YL on time only seconds before proving 
a win, both authors believed YL to be the stronger program at the 30-minute 
per game time constraints. The program MIA, by Mark Winands at University 
of Maastricht, took the bronze medal. YL successfully defended its title at the 
Sixth Computer Olympiad in 2001 ahead of MIA- II, and again won the gold 
medal at the Seventh Computer Olympiad in 2002 ahead of a steadily improving 
MIA-III (Bjornsson and Winands, 2002). Mona did not compete in either 
event. In July 2002, a four game friendly match was played between MIA-III 
and the original Mona from 2000. Each program won two games, and based 
on further analysis of the moves played, it appeared that MIA-III had largely 
closed the gap that previously existed. 
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Rank 


Rating 


Colour 


Name (Country) 


1 


2763 




Mona (Canada) 25 wins, 0 draws, 0 losses 


3 


2374 


W 


Jorge Gomez Arrausi (Spain) 


4 


2202 


WB 


Claude Chaunier (France) 


5 


2192 


BW 


Kerry Handscomb (Canada) 


6 


2102 


W 


Uli Vogel (Germany) 


7 


2086 


W 


Ragnar Wikman (Finland) 


8 


2062 


W 


Hartmut Thordsen (Germany) 


10 


1999 


W 


Dave Dyer (USA) 


11 


1981 


BB 


Patrick Duff (USA) 


13 


1919 


B 


John Bosley (New Zealand) 


18 


1871 


W 


Fred Kok (Netherlands) 



Table 1. Mona’s e-mail results against the top human players. 



6.2 E-mail Correspondence Competitions 

At deeper search depths, Mona’s strength increases dramatically. However, 
due to the expensive evaluation function, it can take a few hours to complete 
11-ply early in the game, or 13-ply in the middlegame. This makes Mona 
particularly well-suited to e-mail correspondence play, with a pace of roughly 
one move per day. 

Beginning in the summer of 2000, Mona began playing on Richard’s PBeM 
server (a popular play-by-email service) against many of the strongest known 
human players, winning every game played. Mona then won the Fifth An- 
nual E-mail Tournament (the de facto world championship) with a perfect 14-0 
record, including wins over most of the best LoA players in the world. 

Table 1 lists some of the e-mail games played by Mona from May 2000 
to May 2001. The chess-style ratings were calculated independently, using 
iterative re-computation over a database of more than 1000 PBeM LoA games 
until reaching convergence. The #2 rated correspondence player, at 2417, is 
the program MIA, which has only lost to the top human player, Jorge Gomez 
Arrausi. Jorge Gomez Arrausi won the 2000 e-mail championship, and was the 
top finishing human again in 2001, losing only to Mona. Several of the players 
listed are former LoA medalists at the Mind Sports Olympiad, including Fred 
Kok (gold twice), Hartmut Thordsen (gold), Ragnar Wikman (silver twice), and 
John Bosley (bronze). 

Mona had the second move in most of the games against the top players, 
which is believed to be a larger disadvantage than having the Black pieces in 
chess. Mona also used considerably less time than her human opponents. In 
the final round of the 2001 e-mail tournament, Mona used an average elapsed 
time of 3.2 days per game, while her opposition used an average of 42.7 days 
per game against her. Based on the perfect record against elite competition, it is 
safe to conclude that the playing strength of Mona exceeds that of all human 
players by a considerable margin. However, it should be noted that LoA is 
still a young game, growing in popularity, and it is possible that “grandmaster” 
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calibre players could emerge in the future, giving programs a tougher challenge 
than has been seen to date. Programs will also continue to improve, and as 
they help humans to deepen their understanding of the game, that in turn could 
provide new knowledge to be added to future programs. 

In April 2001, an 8-game match was played between older versions of Mona 
and YL, using the correspondence time control of 8 hours per move. Each game 
took roughly one week to complete. It is likely that these games constituted the 
highest level of play ever attained in LoA at that time. Mona won the match 
convincingly, with 7 wins and 1 loss. We intend to repeat this experiment using 
a more recent (and much stronger) version of YL. 

63 Knowledge versus Search Experiments 

In general, the farther a program looks ahead, the better it plays. This is the 
main justification for designing fast-searching game-playing programs. Histor- 
ically, deeper search in chess programs has always led to significant improve- 
ments in performance. However, experimental studies have demonstrated di- 
minishing returns with additional search depth (Junghanns and Schaeffer, 1997; 
Heinz, 2001). 

We are also interested in investigating the importance of knowledge as LoA 
programs are given more time to think, since this will be a good predictor of 
how faster hardware platforms in the future will affect a program’s playing 
strength. Traditionally, such investigations have involved a series of self-play 
experiments. 

Constant Knowledge (Self-play) Experiments. First we repeated the most 
common self-play experiments, using YL and Mona. The results of those 
matches are shown on the left in Table 2. Each data point is the outcome 
of a 200-game match. A standardized set of 100 3-ply openings was defined 
(available upon request), and each player played both sides of each opening. 

As in chess, searching deeply is obviously important: the deeper searching 
program invariably outperforms the shallower searching program by a consid- 
erable margin. However, as the search depth increases, the winning margin 
decreases, supporting the aforementioned experimental results found in the do- 
main of chess. 

Also of interest is that YL appears to benefit more from the deeper search 
than Mona. As noted in previous discussion, some of the knowledge in Mona 



Time(sec) 


YL vs YL01 


YL vs Mona 


2 


77.50 


81.00 


8 


79.75 


79.25 


32 


83.25 


80.25 


128 


81.00 


65.75 



Depth 


YL vs YL 


Mona vs Mona 


5 vs 7 


89.75 


79.50 


7 vs 9 


85.75 


78.00 


9 vs 11 


79.75 


72.50 


11 vs 13 


79.00 


- 


13 vs 15 


72.75 


- 



Table 2. Fixed and variable knowledge experiments. 
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directly compensates for the shallower search; whereas YL must depend on the 
deeper search to actually witness certain short-term tactics, refining its minimax 
evaluation of the position. The results of 200-game matches alone are not 
conclusive, but the same trend has been observed for other experiments as well. 

Varying Knowledge Experiments. The self-play experimental setup only 
shows the benefits of additional search when playing against a very similar 
program. This does not address the possible effects of knowledge. The exper- 
iments reported above might be misinterpreted to infer that YL would benefit 
more than Mona from faster hardware - the exact opposite is true! 

To test the knowledge differences directly, YL was also played against the 
2001 version, labeled YL01. The more recent version was greatly improved 
with the addition of several types of knowledge and evaluation features, but 
the two YL programs are almost identical in search capacity, due to the pre- 
calculated evaluation tables described earlier. Although Mon A has not un- 
dergone any significant changes since the 2000 Computer Olympiad, it is still 
regarded to have the best-informed evaluation function of the three programs. 

The results of those matches are shown to the right in Table 2. At shorter 
time controls, YL outperforms both YL01 and Mona by a similar winning 
margin. As the time controls get longer, YL continues to outperform YL01 at 
a comparable win rate, indicating that the improvements in knowledge (the only 
significant difference between the two programs) continue to pay dividends as 
search depth increases. 

However, the win rate against Mona drops off dramatically once the latter 
is given at least two minutes per move, despite the fact that YL continues to 
outsearch Mona by almost 3-ply on average. (This trend continues at longer 
time controls, but those experiments were not complete and are not shown here). 
Presumably, the fast search engine is gaining less and less from deeper search, 
while the knowledge advantage continues to provide sustainable benefits. 

Further evidence is seen in matches between Mona and YL with equal 
fixed depths (using comparable sets of search enhancements). Whereas Mona 
won about 55% of games at 5-ply, 7-ply, and 9-ply (54.50%, 55.25%, and 
56.00%, respectively), her win rate increased to 71.00% when searching 1 1-ply 
per move. 

The implication of these observations is that faster hardware platforms do 
indeed benefit knowledge-rich programs more than fast searchers. This in turn 
suggests that when developing strategic game-playing programs, time invested 
on improving the program’s board evaluation will generally pay greater div- 
idends in the long run than effort spent on search improvements (especially 
in view of the ever increasing difficulty in obtaining significant improvements 
in search efficiency). Similar behaviour has been observed in chess programs 
(Berliner et al., 1990). 
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7. Conclusions 

We have revisited some long-standing questions regarding the roles of search 
and knowledge in high-performance game-playing programs, using two cham- 
pion Lines of Action programs which emphasize different aspects of these 
contrasting approaches. Although the experiments are far from exhaustive, the 
results obtained so far are entirely consistent with previous studies for the game 
of chess. This supports the view that these are general phenomena, rather than 
game-specific. 

By considering the effects of the knowledge level of game-playing programs 
over increasing search depths, it is possible to get a glimpse of what will likely 
be seen in the future. Although it is clear that search depth is and will continue to 
be very important, there are definite indications that increasing the knowledge 
holds the greater promise for lasting improvements in performance. 

References 

Beal, D. (1989). Experiments with the null move. In Advances in Computer Chess 5, pages 
65-89. Elsevier Science Publishers B.V. D. Beal (editor). 

Berliner, H., Goetsch, G., Campbell, M. S., and Ebeling, C. (1990). Measuring the performance 
potential of chess programs. Artificial Intelligence , 43(1):7— 20. 

Billings, D. (2000). http://www.cs.ualberta.ca/~garaes/LOA/. 

Bjornsson, Y. and Marsland, T. (2001). Multi-cut alpha-beta pruning in game- tree search. The- 
oretical Computer Science, 252:177-196. 

Bjornsson, Y. and Winands, M. (2002). YL wins Lines of Action tournament. ICG A Journal , 
25(3): 185-186. See also: 23(3): 179-179, and 24(3): 180-181. 

Buro, M. (1999). From simple features to sophisticated evaluation functions. In Proceedings of 
The 1st International Conference on Computers and Games (CG ’98). LNCS special issue 
Computers and Games , pages 126-145. Springer- Verlag, Berlin, Germany. 

Dyer, D. (2003). Lines of action, http : //www. andromeda. com/people/ddyer/loa/. 
Handscomb, K. (2003). Abstract Games, http : //www. abstractgamesmagazine . com. 

Heinz, E. (2000). Scalable search in computer chess. Vieweg Verlag, Germany. 

Heinz, E. (2001). Self-play, deep search and diminishing returns. ICCA Journal , 24(2):75-79. 
Junghanns, A. and Schaeffer, J. (1997). Search versus knowledge in game-playing programs 
revisited. In IJCAI-97 , pages 692-697. 

Marsland, T. A. (1982). Relative performance of the alpha-beta algorithm. ICCA Journal , 5(2):21- 
24. 

Sackson, S. (1969). A Gamut of Games. Random House. 

Sakuta, M., Hashimoto, T., Nagashima, J., and Iida, H. (2002). Endgame-search techniques 
developed in shogi: application to Lines of Action. In Caulfield, H. et al., editor, Proceedings 
of J CIS 2002 , pages 458-460. 

Schaeffer, J. (1986). Experiments in search and knowledge. Ph.D. thesis, Department of Com- 
puting Science, University of Waterloo, Canada. 

Winands, M.H.M., Uiterwijk, J.W.H.M., and Van den Herik, H.J. (2002). PDS-PN: A new proof- 
number search algorithm: Application to Lines of Action. In Proceedings of The 3rd Inter- 
national Conference on Computers and Games (CG ’02). To appear. 




AN EVALUATION FUNCTION FOR LINES 
OF ACTION 



M.H.M. Winands, H J. van den Herik, J.W.H.M. Uiterwijk 

Institute for Knowledge and Agent Technology, Department of Computer Science, 
Universiteit Maastricht, P.O. Box 616, 6200 MD Maastricht, The Netherlands 
{m.winands,herik, uiterwijkj@cs.unimaas.nl, http://www.cs.unimaas.nl/rn.winands/ 



Abstract Lines of Action (LOA) is a two-person zero-sum chess-like connection game. 

Building an evaluation function for LOA is a difficult task because not much 
knowledge about the game is available. In this paper the evaluation function 
of the tournament program MIA is explained. This evaluator consists of the 
following nine features: concentration, centralisation, centre-of-mass position, 
quads, mobility, walls, connectedness, uniformity, and player to move. These 
features have resulted in the evaluator MIA IV. The evaluator is tested in a tour- 
nament against other LOA evaluators, which have performed well at the previous 
Computer Olympiads. Experiments show that MIA IV defeats them with large 
margins. It turns out that the evaluator even performs better at deeper searches. 

Keywords: Lines of Action, evaluation function, MIA 

1. Introduction 

LOA is a two-person zero-sum game with perfect information; it is a chess- 
like game with a connection-based goal, played on an 8 x 8 board. LOA was 
invented by Claude Soucie around 1960. Sid Sackson (1969) described it in his 
first edition of A Gamut of Games. After this publication, LOA received some 
attention of AI researchers. For instance, the first LOA program was written at 
the Stanford AI laboratory around 1975 by an unknown author. In the 1980s and 
1990s “hobby" programmers wrote several LOA programs. However, all were 
beatable by humans (Dyer, 2000). At the end of the nineties LOA again became 
a target of AI researchers. Some of them used LOA only as a test domain for 
their algorithms, others tried to build strong LOA programs by using new ideas. 
The programs YL, Mona and MIA (Maastricht In Action) belong to the latter 
category. MIA finished third, second and again second at the fifth, sixth and 
seventh Computer Olympiad, respectively (Bjomsson, 2000; Bjomsson and 
Winands, 2001; Bjomsson and Winands, 2002). The program can be played 
online at the website: http://www.cs.unimaas.nl/rn.winands/loa/. 
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The standard framework of the af3 search with its enhancements offers a 
good start for building a strong game-playing program. The real challenge in 
LOA is building a decent evaluation function, which incorporates the strategic 
intricacies of the game. The difficulty lies in the fact that knowledge about LOA 
evaluation functions is not well developed, although some material on this topic 
has been published (Winands et al., 2001). In this paper we discuss the latest 
evaluation function used in the program MIA. 

The remainder of this paper is organised as follows. Section 2 explains 
the game of Lines of Action and describes the search engine. In Section 3 
the evaluation function is explained. This evaluation function is tested against 
other evaluators in Section 4. Finally, in Section 5 we present our conclusions 
and topics for future research. 

2. Test Environment 

In this section we explain first the game of Lines of Action. Next, the search 
engine of MIA is described briefly. 

2.1 Lines of Action 

LOA is played on an 8 x 8 board by two sides, Black and White. Each side has 
twelve pieces at its disposal. The players alternately move a piece, starting with 
Black. A move takes place in a straight line, exactly as many squares as there 
are pieces of either colour anywhere along the line of movement. A player may 
jump over its own pieces. A player may not jump over the opponent’s pieces, 
but can capture them by landing on them. The goal of a player is to be the first 
to create a configuration on the board in which all own pieces are connected in 
one unit. The connections within the unit may be either orthogonal or diagonal. 
In the case of simultaneous connection, the game is drawn. If a player cannot 
move, this player has to pass. If a position with the same player to move occurs 
for the third time, the game is drawn. 

Analysis of 2585 self-play matches showed an average branching factor 
of 29 and an average game length of 44 ply. The game-tree complexity and 
state-space complexity are estimated to be 0( 10 23 ) (Winands et al., 2001) and 
0(1O 64 ), respectively. A characteristic property of LOA is that it is a converging 
game (Allis, 1994), since the initial position consists of 24 pieces, and during the 
game the number of pieces (usually) decreases. However, since most terminal 
positions have still more than 10 pieces remaining on the board (Winands, 
2000), endgame databases are (probably) not effectively applicable in LOA. As 
a case in point, we remark that an endgame database of ten pieces would require 
approximately 10 terabytes. Finally, in LOA the standard chess notation for 
moves is used. 




An Evaluation Function for Lines of Action 



251 



2.2 MIA’s Search Engine 

MIA performs an a/3 depth-first iterative-deepening search. Several tech- 
niques are implemented to make the search efficient. The program uses PVS 
(Principal Variation Search) to narrow the a/3 window as much as possible 
(Marsland and Campbell, 1982). A two-deep transposition table (Breuker et al., 
1996) is applied to prune a subtree or to narrow the o:/3 window. At all interior 
nodes which are more than 2 ply away from the leaves, the program generates 
all the moves to perform the Enhanced Transposition Cutoffs (ETC) scheme 
(Schaeffer and Plaat, 1996). Next, a null move (Donninger, 1993) is performed 
before any other move and it is searched to a lower depth (reduced by R ) than 
other moves. In the search tree we distinguish three types of nodes, namely PV 
nodes, CUT nodes, and ALL nodes (Knuth and Moore, 1975; Marsland and 
Campbell, 1982). The null move is done at CUT nodes and at ALL nodes. At a 
CUT node a variable scheme, called adaptive null move (Heinz, 1999), is used 
to set R. If the remaining depth is more than 6, R is set to 3. When the number 
of pieces of the side to move is lower than 5 the remaining depth has to be more 
than 8 for setting R to 3. In all other cases R is set to 2. For ALL nodes R = 
3 is used. If the null-move does not cause a /3-cut, multi-cut (Bjomsson and 
Marsland, 1999) is performed. Experiments showed that using multi-cut is not 
only beneficial at CUT nodes but also at ALL nodes (Winands et al., 2003). 
For move ordering, the move stored in the transposition table, if applicable, is 
always tried first. Next, two killer moves (Akl and Newborn, 1977) are tried. 
These are the last two moves, which were best or at least caused a cut-off at the 
given depth. Thereafter follow: (1) capture moves going to the inner area (the 
central 4x4 board) and (2) capture moves going to the middle area (the 6x6 
rim). All the other moves are ordered decreasingly according to their scores in 
the history table (Schaeffer, 1983). In the leaf nodes of the tree a quiescence 
search is performed. This quiescence search looks at capture moves, which 
form or destroy connections (Winands et al., 2001) and at capture moves going 
to the central 4x4 board. 

3. Evaluation Function 

In this section the evaluation function of MIA is explained. This evaluator 
consists of the following nine features: concentration , centralisation , centre - 
of-mass position , quads , mobility , walls, connectedness , uniformity , and player 
to move. These features are described below in detail (Subsection 3.1 to 3.9), 
followed by some information about the use of caching (Subsection 3.10). 
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Figure 1. (a) Scattered Pieces (b) Position with two black Q^s. 



3.1 Concentration 

The concentration of the pieces is calculated by a centre-of-mass approach. In 
MIA this is done in four steps. First, the centre of mass of the pieces on the board 
is computed for each side. Second, we compute for each piece its distance to the 
centre of mass. The distance is measured as the minimal number of squares from 
the piece to the centre of mass. These distances are summed together, called the 
sum-of-distances. Third, the sum-of-minimal-distances is looked up in a table. 
It is defined as the sum of the minimal distances of the pieces from the centre 
of mass. This number is necessary since otherwise boards with a few pieces 
would be preferred. For instance, if we have ten pieces, there will be always 
eight pieces at a distance of at least 1 from the centre of mass, and one piece at a 
distance of at least 2. In this case the sum-of-minimal-distances is 10. Thus, the 
sum-of-minimal-distances is subtracted from the sum-of-distances, the result 
being called the surplus-of-distances. Fourth, we calculate the concentration, 
defined as the inverse of the average surplus-of-distances. Since by doing so 
we reward positions with pieces in the neighbourhood of each other, eventually 
they will be connected in solid formations or they will create threats to win. 

3.2 Centralisation 

Each piece gets a value dependent on its board square according to this 
feature. Pieces at squares closer to the centre are given higher values than the 
ones farther away. Pieces at the edge are given a negative value. This is done 
because such pieces are easy to block by a wall (see Subsection 3.6). Pieces 
at the comer are punished even more severely. To prevent the program from 
over-aggressively capturing pieces, the average is computed instead of the sum 
of piece values. 
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3.3 Centre-of-mass Position 

In earlier versions of MIA positions with a somewhat more centralised centre- 
of-mass were slightly preferred. The idea was to prevent formations from being 
built on the edges, where they are more easily destroyed or blocked. Interest- 
ingly, after applying Temporal-Difference (TD) learning the weight for the 
centralised centre-of-mass feature is changing its sign (Winands et al., 2002), 
which means that opposite to expectations it is good to have the centre-of-mass 
closer to the edge instead of in the centre. If the centre-of-mass is in the centre, 
it is possible that pieces are scattered over the board (e.g., the white pieces in 
Figure la). If the centre of mass is at the edge, pieces have to be in the neigh- 
bourhood of each other, otherwise they would lie outside the board. Another 
plausible explanation of why it is worse to have the main piece formation in 
the centre is that it can be more easily attacked there, whereas groups residing 
closer to the edge can only be attacked from one side. 

3.4 Quads 

The use of quads for a LOA evaluation function was first proposed and 
implemented by Dave Dyer in 1996 in his program LoaJava and empirically 
evaluated by Winands et al. (2001) . This feature counts certain quads types. 
A quad is defined as a 2x2 array of squares (Gray, 1971). In this feature we 
only consider quads of three (Q3) or four pieces (Q4) of the same colour, since 
it is impossible to destroy these formations by a single capture. However, the 
danger exists that many of those quads are created outside the neighbourhood 
of the centre of mass. So, in MIA we have rewarded only Cb’s and Ck’s, which 
are at a distance of at most two of the centre of mass. For instance, Black has 
two Ck’s in Figure lb. 

3.5 Mobility 

In the mobility feature the number of moves for each side are computed. 
This feature was first implemented in Mona and YL. In previous evaluation 
functions of MIA all moves were weighted equally. However, experiments have 
shown that certain move types are better than others (see also Hashimoto et al., 
2003). Therefore, in MIA the following bonus/malus system is applied: the 
value of a capture move is doubled, the value of a move going to an edge or 
a move along an edge is halved. If a move belongs to multiple categories, the 
bonus/malus system is used multiple times. For example, let us assume that a 
regular move gets value 1, then a capture move gets value 2, a capture move 
going to an edge gets value 1, a capture move in an edge line going to a comer 
gets value 0.5. The computational requirements of this component are not high. 
For each line configuration of pieces (represented as a bit vector) the mobility 
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can be precomputed and stored in a table. During the search, the index scheme 
can be updated incrementally and in the evaluation function only a few table 
lookups have to be done. 

3.6 Walls 

Because a piece is not allowed to jump over the opponent’s pieces, it can 
happen that the piece is blocked, i.e., cannot move. Blocking a piece far away 
from the other pieces is an effective way of preventing the opponent to win. 
Even partial blocking, called a wall (Handscomb, 2000), can be quite effective, 
since it forces a player to find a way around the wall. Detecting whether a piece 
is (partially) blocked can be expensive as we have to know what the moves of 
the piece are and what its goal is. In MIA we look only at walls that prevent 
the opponent’s edge pieces from moving toward the centre. These walls are 
quite common and effective. The patterns can be precomputed and therefore 
are easy to detect. For example, in Figure 2a the piece on a4 is blocked in 
three ways going to the centre, whereas the piece on h4 is only blocked in two 
centre directions. In the evaluator, we distinguish between walls which block 
two or three centre directions. We also remark that we take special care of 
walls which block comer pieces. For example, the piece on h8 is blocked only 
in two directions, but we evaluate this position as if it was blocked in 3 centre 
directions. The totally isolated piece on a8 is evaluated as if there were two 
walls which both block the piece in three directions. We only look at certain 
blocking patterns for edge pieces. For example, the pieces on bl and cl are 
completely blocked, but we take only the two 3-centre directions blocks into 
account. It is a subject of future research to incorporate more of these kind of 
patterns. 



3.7 Connectedness 

Although the concentration component and quad component favour solid 
formations in the centre, there is still room for a component which determines 
the connectedness of a side. In MIA we compute the average number of connec- 
tions of a piece. In some evaluation functions the total number of connections is 
taken into account (e.g., YL), but this could implicitly be a material advantage. 
Any kind of material component in LOA evaluation functions is always tricky 
because the program might wildly capture pieces. This feature does not take 
into account whether a connection is important or not. To distinguish this, a 
global look at the board would be needed, which is time consuming. The num- 
ber of connections for each side in each line configuration can be precomputed 
as is done with the mobility component. 
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Figure 2. (a) Position with walls (b) Position with an outlier on b8. 



3.8 Uniformity 

The disadvantage of the centre-of-mass approach is that it aims to connect as 
many pieces as possible in a local group, hardly worrying about some remote 
pieces (orphans). It is sometimes hard to connect these orphans. For instance, 
in Figure 2b the black pieces are grouped around e2, but the black piece on b8 
is rather far away from this group. To prevent that one or more pieces become 
too remote from the main group, a feature is used which aims at a uniform 
distribution (Chaunier and Handscomb, 2001) to counterbalance the negative 
effects of the centre-of-mass approach. In our program this is done in a way 
which is primitive but effective. The area of the distributed pieces is computed, 
assuming it is a rectangle. The smaller the area is, the higher the reward is. An 
analogous implementation was first done in the program YL, but details are not 
known. 

3.9 Player to Move 

In the search tree not every leaf node has the same player to move. A small 
bonus is given to the moving side, since having the initiative is mostly an 
advantage in LOA (Winands, 2000) like in many other games (Uiterwijk and 
van den Herik, 2000). 

3.10 Caching 

It is possible in our evaluation function to cache computations of certain fea- 
tures, which can be used in other positions. Let us assume that we investigate 
the move b8-c8 in Figure 2b and evaluate the resulting position. If we next 
investigate b8-b7 we notice that certain properties of White’s position remain 
the same (e.g., the number of pieces, centre-of-mass, the number of connec- 
tions), whereas others can change (e.g., moves, blockades). It easy to see that 
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we do not have to compute the concentration, centralisation, position of the 
centre-of-mass, quads, connection, and uniformity for White again. Evaluation 
of components, which are not dependent of the position of the other side, are 
stored in the evaluation cache table. In the current evaluation function this gives 
a speed-up of at least 60 percent in the number of nodes investigated per second. 

4. Experiments 

In order to quantify the improvements of the evaluation function, we played 
a round-robin tournament in which evaluators from earlier tournament versions 
of the program participated. All evaluators used the current search engine, 
described in Subsection 2.2. The evaluators are explained in Subsection 4.1. 
The results are described in Subsection 4.2. 

4.1 Benchmark Evaluators 

The benchmark evaluation functions are described below. 

Evaluator: MIA I The core of this evaluation function is the centre-of-mass 
approach. The quad feature is also implemented. Pieces at the edge are given a 
negative bonus. Contrary to MIA IV a bonus is given for a centralised centre- 
of-mass (Winands et al., 2001). The weights of the features were carefully 
hand-tuned. In retrospect this evaluator was primitive, although it won a game 
against both Mona and YL at the fifth Computer Olympiad (Bjomsson, 2000). 

Evaluator: MIA II The major change of this evaluation function compared 
to the previous one is the introduction of the mobility component. There is no 
discrimination in rewarding different move types. In this evaluator pieces at a 
comer edge are punished more severely. Using this evaluator the tournament 
program shared the first place with YL in the regular tournament at the sixth 
Computer Olympiad. The play-off match was won by YL (Bjomsson and 
Winands, 2001). 

Evaluator: MIA III This evaluation function is enhanced with the wall 
feature. The centralisation feature is improved by rewarding pieces in the centre. 
A bonus is given for the player to move. The major improvement was retuning 
all the weights by using TD-leaming (Winands et al., 2002). There were three 
major changes in the weights. First, the initial weight of the dominating centre- 
of-mass was decreased to one tenth of its original value, indicating that we 
had overestimated the importance of this feature. Second, the weight for the 
centralised centre-of-mass feature changed its sign, which means that opposite 
to expectations it is good to have the centre-of-mass closer to the edge instead 
of in the centre. Third, the weight of the centralisation component increased 
the most, indicating that we had overestimated the importance of this feature. 
Using this evaluator the tournament program finished second at the seventh 
Computer Olympiad (scoring 1.5 points out of 4 against the much improved 
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winner YL) (Bjomsson and Winands, 2002). An exhibition match was played 
against MONA during the Third International Conference on Computers and 
Games 2002 ( CG’02 ), which ended in a 2-2 tie (Billings and Bjomsson, 2002). 

Evaluator: MIA IV This evaluation function incorporates all features as 
described in Section 3. The centralisation, wall, and player-to-move features 
used the same weights as the ones in MIA III. All the weights of the other features 
were basically found by using TD-leaming. Some of them were adjusted by 
hand afterwards. 

An overview of the separate features as used in the four evaluators is given 
in Table 1. Note that the weights and details of the features may differ between 
different evaluators. 





MIA I 


MIA II 


MIA III 


MIA IV 


Concentration 


X 


X 


X 


X 


Centralisation 


X 


X 


X 


X 


C.o.m. position 


X 


X 


X 


X 


Quads 


X 


X 


X 


X 


Mobility 




X 


X 


X 


Walls 






X 


X 


Connectedness 








X 


Uniformity 








X 


Player to move 






X 


X 



Table 1. Overview of the features. 



4.2 Results 

The evaluators, previously described, played 1000 matches against each other 
in a round-robin tournament. They started always from the same 10 positions 
given in the Appendix, playing with both colours. To prevent that programs 
played the games over and over again, a sufficiently large random factor was 
included in each evaluation function. 

Fixed-depth searches were used as time control instead of time. At first sight 
it may look as if we are favouring the more advanced evaluators (i.e., they are 
time intensive because of the extra knowledge). This is not a problem for two 
reasons. First, the difference in speed is quite moderate. The program runs only 
15 per cent slower with the MIA IV evaluator than with the MIA I evaluator. 
All the evaluators have to compute the average distance to the centre-of-mass 
and the quads, which is time consuming. Most other additions are relatively 
cheap. Second, when an evaluator is a good predictor of the situation, a best 
move found at a shallow search is more likely to stay good and therefore causing 
cut-offs at deeper searches. For example, when the MIA I evaluator is used in 
the current search engine it searches 75 per cent more nodes compared to the 
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MIA IV evaluator. The advantage of fixing the depth is that we can measure 
the influence of increasing the depth. 



Evaluator 


MIA I 


MIA II 


MIA III 


MIA IV 


MIA I 


0 


259 


199 


71.5 


MIA II 


741 


0 


373 


163.5 


MIA III 


801 


627 


0 


248.5 


MIA IV 


928.5 


836.5 


751.5 


0 


Table 2. 


Tournament results at depth 4. 




Evaluator 


MIA I 


MIA II 


MIA III 


MIA IV 


MIA I 


0 


188 


168.5 


51 


MIA II 


812 


0 


356 


174 


MIA III 


831.5 


644 


0 


223.5 


MIA IV 


949 


826 


776.5 


0 


Table 3. 


Tournament results at depth 6. 




Evaluator 


MIA I 


MIA II 


MIA III 


MIA IV 


MIA I 


0 


137 


159.5 


41.5 


MIA II 


863 


0 


360 


129 


MIA III 


840.5 


640 


0 


205 


MIA IV 


958.5 


871 


795.0 


0 


Table 4. 


Tournament results at depth 8. 




Evaluator 


MIA I 


MIA II 


MIA III 


MIA IV 


MIA I 


0 


97.5 


137.5 


44.5 


MIA II 


902.5 


0 


359.5 


121.5 


MIA III 


862.5 


640.5 


0 


234.5 


MIA IV 


955.5 


878.5 


765.5 


0 



Table 5. Tournament results at depth 10. 



In Tables 2-5 the results of the tournaments are given for searches to depth 4, 
6, 8, and 10, respectively. MIA IV defeats the previous evaluators of MIA with 
ease. Even the strong MIA III is not able to score more than 20 to 25 percent 
of the points against MIA IV. Although MIA II’s only major improvement is 
a primitive mobility component, it did not only outperform MIA I, but it also 
played much better against MIA III and IV than MIA I did. Interestingly, the 
weak MIA I performs worse at deep searches, whereas the opposite holds for 
the strong MIA IV evaluator. A reason might be that at the one hand a deep 
search is not able to compensate the lack of knowledge of MIA I, while at the 
other hand a deep search exploits more of the potential of MIA IV. 
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5. Conclusions and Future Research 

In this paper we have seen that MIA IV defeats the older evaluators by large 
margins. Most additions of MIA IV in knowledge are quite simple to evaluate 
and lead to big rewards in playing strength. It turns out that MIA IV even 
performs better at deeper searches. 

More patterns of blocked pieces, better distinction of move types in the mo- 
bility component, and additional knowledge whether a connection is important 
are some of the issues which could improve the evaluator. There is still room 
to fine tune certain weights and parameters in the evaluation function. Until 
now the authors of the strong programs YL and Mona have not published 
the details of their programs’ evaluators. If their knowledge becomes available, 
combining their ideas with MIA IV would probably further increase the playing 
strength significantly. 
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Appendix: Start Positions 




Below the positions are given, which are used in the experiments of Section 4. 




6: BTM 



7: BTM 





10: BTM 
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Abstract We present an algorithm which determines the outcome of an arbitrary Hex 
game- state by finding a winning virtual connection for the winning player. Our 
algorithm performs a recursive descent search of the game-tree, combining fixed 
and dynamic game- state virtual connection composition rules with some new 
Hex game-state reduction results based on move domination. The algorithm is 
powerful enough to solve arbitrary 7x7 game-states; in particular, we use it to 
determine the outcome of a 7 x 7 Hex game after each of the 49 possible opening 
moves, in each case finding an explicit proof-tree for the winning player. 

Keywords: Hex, virtual connection, pattern set, move ordering, move domination, game-state 

reduction 

1. Introduction 

Hex is the classic two-player board game invented by Piet Hein in 1942 and 
independently by John Nash around 1948 (Gardner, 1959; Nasar, 1998). The 
board consists of a rhombus-shaped nxn array of hexagons, also called cells. 
Each player is assigned a set of stones and two opposing board sides, all with 
the same colour; say Black gets black stones and sides, while White gets white 
stones and sides. Players alternately place a stone on an unoccupied cell. The 
first player to form a path connecting his/her two sides with his/her stones wins 
the game. See Figure 1. For more on Hex, see Browne (2000) and Hayward 
and Van Rijswijck (200x). 

In Hex, an unrestricted opening allows the first player to gain a considerable 
advantage: it is known that there exists a winning strategy for the first player 
(Gardner, 1959), and while no explicit strategy which holds for arbitrary sized 
boards is known, most players believe that opening in the centermost cell in 
particular is a winning move. In order to offset this opening move advantage, 
the game is often started according to the following “swap rule”: colours are 
assigned to the four sides of the board, but not to the players; one player then 
places a stone on any cell; the other player then chooses which colour stones 




262 



R. Hayward, Y. Bjornsson, M. Johanson, M. Kan, N. Po, J. van Rijswijck 




Figure 1. An empty 7x7 board, and 




. . . a finished game; Black wins. 



to play with. The second move is played by the player whose stones are the 
opposite colour of the first stone. From then on, the game continues in normal 
fashion, namely with players alternating moves. 

With respect to Hex, a board-state describes a particular placement of some 
number of black stones and some number of white stones, such that each cell 
has at most one stone. We assume no constraint on the relative number of stones 
of each colour, as the game may have started with a handicap advantage for one 
of the players. The empty board-state has no stones on the board. A k-opening 
is a board-state with exactly k stones on the board. A turn-state describes 
which player has the next move. A game-state , or simply a state, consists of 
a board-state and a turn-state. We denote by G = [P, B] the game-state with 
turn-state P and board-state B\ for this game-state, we say that P wins G if P 
has a winning strategy for G. For a board-state P, we say that P wins B if P 
wins G = [P, B]. 

A state is solved if the winning player is known, and explicitly solved if a 
winning strategy is known. As we have already remarked, for arbitrarily large 
boards, Hex has been solved for the empty board-state, but not explicitly solved. 

In this paper we consider the problem of solving arbitrary Hex states, and 
present an algorithm which solves this problem. The worst-case running time 
of our algorithm is exponential in the number of cells in the board, which is not 
surprising given that solving arbitrary Hex states is PSPACE-complete (Reisch, 
1981). As a benchmark for the efficiency of our algorithm, we solve all 7x7 
1 -openings. Previously known 1 -opening results are summarized in Figure 2. 

Our results yield the first computer solution of any Hex state on a 7x7 or 
larger board. Solving Hex states on 5 x 5 or smaller boards is a computationally 
routine task. To solve arbitrary 6x6 Hex states, Van Rijswijck (2000, 1999- 
2003) used an alpha-beta search guided by a Hex-specific evaluation function; 
his algorithm solved all 1 -openings and many longer openings. As this method 
was not strong enough to solve 7x7 states, he further described but did not 
implement an alternative recursive-descent algorithm (Van Rijswijck, 2002). 
Recently Yang et al. solved by hand several 7x7 1-openings (Yang et al., 2001, 
2002a), one 8x8 1-opening (Yang et al., 2002b), and one 9x9 1-opening (Yang, 
2003). 
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Figure 2. Previously known 1 -opening results. The stone on each cell indicates the winner 
with perfect play if White’s first move is to that cell. For cells with no stone, the winner was not 
previously known. The 6x6 results were obtained by Van Rijswijckby computer (Van Rijswijck, 
2002). The 7x7 results were obtained by Yang et al. (2001, 2002) by hand. 

Our algorithm solves an arbitrary Hex state by computing a winning virtual 
connection according to dynamic-state composition rules. Following the re- 
cursive descent game-tree search proposed by Van Rijswijck, our algorithm is 
enhanced by the computation of fixed-state virtual connections; additionally, 
some new Hex move domination and state reduction results allow significant 
pruning of the game-tree. 

Before presenting our algorithm in Section 4 and our 7x7 results in Section 5, 
we provide necessary background information on virtual connections in Section 
2 and state reductions in Section 3. 

2. Connection Sets 

Roughly, a connection set in Hex is a subgame in which one of the players 
can form a connection between two specified sets of cells. If the player can 
connect the two sets even if the opponent moves first, the connection set is called 
a virtual connection or link ; if the player must have the first move in order to 
guarantee the connection, the connection set is called a weak connection or 
prelink. 

More formally, with respect to a fixed Hex state, a player P, sets of cells 1 

X , Y, and a set of cells S , (P:X, 5, Y) is a virtual connection or link if there 
exists a strategy whereby, in the game restricted to the set of cells X U5U7, 
P can form a chain connecting at least one cell of X with at least one cell of 

Y, even if P’s opponent moves first; in other words, (P:X, S,Y) is a virtual 
connection if there exists a second-player- win strategy for P to connect X and 



^ere each of the four sides is also be considered as a cell. 
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Figure 3. A virtual connection formed by weak connections. Each of the three leftmost figures 

shows a Black weak connection, indicated by the dotted cells, from the black stone to the bottom 
right side; the white dot indicates a cell whose occupation would transform the weak connection 
into a virtual connection. The common intersection of these weak connections is empty, so their 
union forms a Black virtual connection from the black stone to the bottom right side, shown in 
the rightmost figure. 

Y in the game restricted to X U S U Y. Analogously, (P:X, S, Y) is a weak 
connection or prelink if there exists a strategy whereby, in the game restricted 
to the set of cells X U S U Y, P can form a chain connecting X and Y if P 
moves first; in other words, (P:X, S, Y) is a weak connection if there exists 
a first-player- win strategy for P to connect X and Y in the game restricted to 
X U S U Y. A P-link (respectively P-prelink) is a link (prelink) for player P. 
See Figure 3. 

In this paper, all virtual and weak connections have the form (P:X, S, Y) 
where X and Y each consist of a single cell; we denote such connections 
(P:x, S, y) where now x and y represent single cells instead of sets of cells. 

Although defined slightly differently by different authors, virtual connections 
have long been recognized as being central to Hex strategy. References to 
virtual connections permeate the Hex literature, where they are also referred 
to as “connections” or “safe groups”. For example, virtual connections are 
discussed by Berge (1977) 2 and Browne (2000). 

Virtual connections are useful in solving states since, when accompanied by 
an explicit strategy, a virtual connection serves as a proof or certificate that a 
pair of cells can be connected. 

In particular, if P has a virtual connection (P:x, S, y) where x and y are the 
two sides belonging to P, then this virtual connection certifies that P wins the 
game. For this reason, we call (P:x, S, y), a win-link (respectively win-prelink) 
if it is a link (prelink) and x and y are P’s two sides. Since the sides of each 
player are fixed, we will sometimes abbreviate (P:x, S,y) by P:S whenever 
x, y are the sides of P. 

Connection sets are particularly effective in Hex end-game analysis. For 
example, the following is a restatement in our terminology of an observation 
made by Berge. 



2 A translated version of appears in Hayward (2003a). 
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Theorem 1 (Berge, 1977; Hayward , 2003a) Consider a state in which a 
player P has the next turn and P *s opponent Q has one or more win-prelinks. 
Then Q wins unless P's next move is to a cell which intersects all Q -win- 
prelinks, for otherwise Q can on the next move convert a win-prelink to a 
win-link. 

In light of this result, for any fixed state and a player P with opponent Q , we 
refer to the set of unoccupied cells in the intersection of all Q -win-prelinks as 
P’s mustplay region. Notice that the computation of a mustplay region is a 
form of null move analysis, as it involves the consideration of what can occur 
if a player skips a turn. 

A useful feature of virtual connections is that smaller ones can be combined 
in various ways to form larger ones. The knowledge of this fact is as old as 
Hex itself; for example, it is discussed in detail by Berge (1977) and Hayward 
(2003a). Recently, Anshelevich (2002) used the following set of combining 
rules to compute connection sets in an inductive or “bottom-up” fashion. A 
P-stone is a stone belonging to P; <fi denotes the empty set. 

Theorem 2 (Anshelevic, 2002) ( P:x , 0, y) is a virtual connection if x and 
y are adjacent. Also, if (P:x, S,y) and (P:y,T,z) are virtual connections 
and {x} U S and T U { z } do not intersect, then (P:x, S U {y} U T, z) is 
a virtual connection if y is occupied by a P-stone and a weak connection 
if y is unoccupied. Also, if (P:x,S\,y), {P:x, $2, y)> ■ (P:£, y) are 

weak connections and the common intersection of the sets Sj is empty, then 
(P:x, S', y) is a virtual connection, where S is the union of the sets Sj. 

Notice that this set of rules is static, in that it yields a class of connection sets for 
a fixed state. This set of rules is not sufficient to establish all virtual connections 
of a state, and is thus not strong enough to solve all Hex states. However, the 
rules do yield a sufficiently large class of virtual connections to provide an 
effective subroutine of a strong Hex-playing program (Anshelevich, 2002). 

As Van Rijswijck observed, an alternative method of computing connection 
sets is to proceed through the game-tree dynamically. Let G = [P, B ] be a state 
and let Q be the opponent of P. For each unoccupied cell x of P, let B + x be 
the board-state obtained by adding to B a P-stone at x, and let G + x be the 
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Figure 4 . A white win-prelink 



. . . and a corresponding win-link. 
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associated state, namely G + x — [Q, B + x\. Call a collection of sets mutually 
exclusive if the intersection of all the sets is empty. Van Rijswijck’s comments 
suggest the following rules for solving a state. 

Theorem 3 (Van Rijswijck, 2002) If P ( respectively Q) has a winning chain 
in B then P:f (respectively Q:f) is a win-link . If neither player has a winning 
chain in B, then 

■ P wins G if and only if P wins G + x for some move x; in this case, 
G + x has some win-link P:S and G has a win-prelink P:S + x, 

■ Q wins G if and only if Q wins G + x for all moves x; in this case, for 
each x, G + x has a win-prelink Q:S X and any collection C consisting 
of one such S x for each x is mutually exclusive; also, for each C' C C, 
if C 1 is mutually exclusive then the union U of the elements of C f is a 
win-link Q:U for G. 

Figure 5 illustrates this theorem. The root state G is a loss for White. Three 
of White’s possible moves are explored. In each state G + X{, the move yi 
yields a black win; the resulting state G + X{ + yi has a black win-link Si, so 
G + Xi has a black win-prelink Si U yp, this win-prelink implies that X{ loses in 
G , and moreover that any white move outside of Si U yi loses. The set of these 
three win-prelinks is mutually exclusive. Indeed, the set containing just the 
win-prelinks £2 U y 2 and £3 U y% is already mutually exclusive, which means 
that the union of these two prelinks is a black win-link in G. It also means that 
the exploration of these two branches of the game-tree is sufficient to determine 
that White loses G\ the consideration of any other move is unnecessary. 

We omit the proof of correctness of the preceding theorem, which follows by 
elementary game-theory arguments from the fact that any Hex state has exactly 
one winner . 3 Notice that these rules are by their definition complete: they can 
be used to solve any arbitrary Hex state. 

From a computational point of view, the difficulty with both of these sets of 
rules is that the number of possible connection sets that can be computed in 
this way is exponential in the number of cells. For this reason, an exhaustive 
approach to computing connection sets based on either rule set will be forced 
to limit the number of intermediate connection sets computed. For example, 
Anshelevich’s (2002) game-playing program has maximum effectiveness when 
the number of x-to-y connection sets stored is limited to about 40 per pair of 
cells x,y . 

For both the static and dynamic computational processes, what is needed 
is some way of distinguishing those intermediate connection sets which are 



3 This fact in turn requires some care to prove; see for example Beck (1969), Gale (1979), and Hayward and 
Van Rijswijck (200x). 
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a 

White to move loses, Black win-link S = {52 Ui/2}U {Sg U £3} 




G + *1 

Black wins, wm-prdink S\ U y 1 





G + ^2 

Black wins, win-prelink 52 U yi 



G + 

Black wins, win-prelink S3 U yz 





G -J- X 2 + 

White loses, win- link S2 



G 4 - xi +yt 
While loses, win- 1 ink 5 i 




G 4 - tea + j/3 
White loses, win - 1 ink 5 a 



Figure 5. An example illustrating Theorem 3. 

critical to solving the particular state from those which are not. We close this 
section by giving evidence that this is likely to be a difficult problem. 

Assume that at some point in a computation involving the dynamic rules it 
is discovered that player P has no winning move in a state G. It follows that 
P’s opponent Q has a win-prelink S x after each possible move x by P and 
that the union of any collection of these win-prelinks which have an empty 
intersection establishes a win-link for Q. If G is an intermediate state in the 
process of solving some earlier state, then P needs to compute such a win-link 
to pass back to the state which gave rise to G. It is reasonable to expect that a 
useful win-link to pass back would be one that has the smallest number of cells, 
among all such possible win-links. However, it is also reasonable to expect 
this problem to be computationally difficult, since it seems to be intimately 
related to determining the outcome of a Hex game, which we have already 
noted is PSPACE-complete. Saks (2003) observed that this problem is indeed 
computationally difficult, as we now explain. 

Formally, the Min- Union Empty Intersection Problem (MUEIP) is the de- 
cision problem which takes as input an integer k together with a set S = 
{Si, . . . , St} of subsets of a finite set V and asks whether there is a subset T 
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of S whose element-intersection is empty and whose element-union has size 
at most k. Min Cover is the decision problem which takes as input an integer 
k together with a set A = {A \, . . . , A t } of subsets of a finite set V and asks 
whether there is a subset of at most k elements of A whose union is V. 

Theorem 4 (Saks, 2003) MUEIP is NP-complete. 

Proof. Consider an instance of Min Cover, where k. A, and V are as defined 
above and n — \V\. This instance can be transformed in polynomial time into 
an instance of MUEIP, as follows. 

For each index j, let Bj be the set complement (with respect to V) of Aj, 
and let B — {Pi, . . . , B t }. Observe that the union of k elements of A is equal 
to V if and only if the intersection of the corresponding k elements of B is 
empty. Let V f be the set obtained by adding t(n + 1) new elements to V. For 
each index j, let Bj be the set obtained by adding n + 1 of the new elements to 
Bj in such a way that each Bj gets expanded by a set of new elements disjoint 
from all other new elements. Let B' = {B [, . . . , B' t }. Observe that a set of 
k elements of B has empty intersection if and only if the corresponding set 
of k element of B ' has empty intersection, and this occurs if and only if the 
same set of k elements of B f has empty intersection and union with size at most 
k(n + 1) + n. 

Since MUEIP is clearly in NP, the theorem follows from the preceding trans- 
formation and the fact that Min Cover is NP-complete (Karp, 1972). □ 

Since using virtual connections alone to solve arbitrary Hex states is likely 
to be computationally difficult, some extra game knowledge must be used to 
reduce the complexity of searching through the game-tree. We discuss some 
such reductions in the next section. 

3. Move Domination and Game-State Reduction 

One reason that Hex is a challenge for computers to play or solve is the 
high branching factor; especially in the early stages of the game, the number 
of possible moves is high. In this section we describe some move ordering 
information which considerably strengthens the algorithmic approach implicitly 
described by the virtual connection composition rules of the previous section. 

A particularly useful form of move ordering information is move domination. 
Informally, one move dominates another if the former is at least as good as the 
latter. Since we are interested here only in solving states, namely in determining 
which player has a win-strategy, one move is “at least as good as” another if 
the former yields a win whenever the latter yields a win. Formally, for possible 
moves u, v from a state [P, P], we say that u dominates v if P wins [Q, B + u] 
whenever P wins [Q,B + v]. 
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Domination results are useful for our purposes since any dominated move 
can be ignored in searching for a winning move. Unfortunately, few results 
have been proved to date on domination in Hex. Beck (1969) proved that on 
an empty board size 2x2 or larger, moving to an acute comer (for example, A1 
in Figure 7 4 ), is a losing, and so dominated, move. Using similar arguments, 
Hayward (2003b) recently obtained a move domination result involving certain 
three-cell configuration, as we now explain. 

For a player P, a side cell is any cell which borders one of P’s two sides, 
a side pair {#i, # 2 } consists of two adjacent side cells which border the same 
side, and a side triangle (x\,X2 , £) consists of a side pair {x\,X2} together 
with a third cell, called the tip , adjacent to the two side cells. See Figure 7. A 
P-triangle is a side triangle belonging to P. 



Theorem 5 (Hayward, 2003b) Let P be a player with opponent Q and let 
B be a board-state with an empty P-triangle {x\, X 2 , t). For each subset S of 
T = {x\, X 2 ,t}, let B P S be the board-state obtained from B by adding a 
P -stone at each cell of S. 

Then, for each j = 1,2 , P wins [Q, B + t\ if P wins [Q, B + Xj], Also, P 
wins any one of the four states [Q, B + t\, [Q, B + {i, x\}\, [Q, B + {£, X 2 }], 
[Q, B + {£, X 2 }] if and only if P wins all of them. 



Our algorithm uses the above results in the following two ways. Firstly, for 
any state [P, B] with an empty P-triangle, P can ignore the two moves to the 
side of the triangle, since they are dominated by the move to the tip. Secondly, 
for any state [Q, B] with a P-triangle with a P-stone at the tip and the two side 
cells empty, P-s tones can be added to the two side cells, since this addition 
does not change the outcome of the game. As can be seen from Figure 12, the 
second result is particularly useful when combined with our virtual connection 
computation approach. 




'^€K> 




Figure 6. Illustrating the second part of Theorem 5. Applying this result to the white side 
triangle with tip E2, it follows that a player has a winning strategy for one of these board-states 
if and only if that player has a winning strategy for all of these board-states. 



throughout this paper, whenever we need to refer to a particular board cell, we assume that the board is 
oriented as in Figure 7, and use the coordinate system shown there. 
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4 . The Algorithm 

Our algorithm Solver combines the approaches suggested by Theorems 1, 
2, 3, and 5. For a player P with opponent Q, the algorithm solves a state 
G = [P, B ] as follows. 

Algorithm Solver(6? = [P, B}) For each side triangle for which the 
second part of Theorem 5 applies, add stones to the appropriate side cells; call 
the resulting board P*. Statically compute virtual and weak connections. If a 
win-prelink for P or a win-link for Q is detected, then return the link (and if 
the win-link uses the tip of a triangle whose side was filled in, then add the side 
cells to the link). 

Otherwise, let T be the set consisting of all Q -win-prelinks for G and let R 
be the P-mustplay region. IfT is empty, then initialize R to be all unoccupied 
cells; otherwise, intialize R to be the intersection of all elements ofT. Remove 
from R any side-cells from any empty P -triangle. While R is not empty, pick a 
cell x in R, and do the following: 

Let B * be the state obtained from P* by adding a P -stone at x 
and, ifx was the tip of an empty P -triangle before this move, filling 
in the triangle. Recursively solve G x = [Q,B*]. 

If P wins G x , say with win-link X, then add to X the cell x as 
well as the two associated side-cells ifx was the tip of an empty 
P -triangle, and exit the while loop and return. If Q wins G x , say 
with win-prelink X, then add X to T. 

If the while loop terminates without discovering a win-prelink for P, then the 
union of elements ofT forms a win-link for Q. 

A sample execution of the algorithm is described in Figures 7 through 9. 
The correctness of our algorithm follows easily from the previous theorems; 
we omit the proof. 




Figure 7. Solver solves b6: initialization. After the initial move (left), the game-state 
is reduced by applying Theorem 5 and adding white stones to the two side-cells of the white 
side-triangle with tip b6. In the resulting state, White has two win-prelinks (center-left and 
center-right) whose resulting intersection yields a 13-cell black mustplay region (right). If Black 
has a winning move, it has a winning move to one of these 13 cells. 
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Figure 8. Solver solves b6-c4. As shown by the Solver b6 recursion tree in Figure 13, c4 
is the first black response considered to the white b6 opening (left). Following the topmost path 
b6-c4-f2-d5-d4-c5-e5-e4-g3-f3-g2-f4 in the recursion tree and applying Theorem 5 after f2 leads 
to the first solved state (center, with white win-prelink); since f4 is a leaf of the recursion tree, 
the white win-prelink here was discovered statically. Solver continues solving the c4-subtree, 
eventually determining that c4 is a black loss (right, with white win-prelink). This win-prelink 
does not contain c4 or b5, so, of the 13 possible b6-responses corresponding to the initial black 
mustplay region described in Figure 7, 1 1 moves remain to be checked. 




Figure 9. Solver solves b6: conclusion. The move to fl is the last black reply considered in 
response to the white b6 opening (left, with white win-prelink), since after the discovery of this 
last white win-prelink, the set of such win-prelinks has empty intersection. The union of these 
1 1 white win-prelinks gives the final win-link for White (right). 



5. SOLVER 7x7 1-Opening Solutions 

As mentioned earlier, Solver is strong enough to solve arbitrary 7x7 states. 
Figures 10 and 11 summarize the results obtained by running Solver on all 
49 7x7 1 -openings. Figures 13 and 14 show the Solver recursion trees from 
two of these executions, while Figure 15 shows a longest line of play from 
each of the 49 solutions. Each execution was performed on a single processer 
machine 5 ; in each case, the run time was roughly proportional to the number 
of nodes in the Solver recursion tree, taking about one minute for the five 1- 
openings with the smallest node-counts, and about 110 hours for the 1 -opening 
with the largest node-count; the total run time for all 49 1 -openings was about 
615 hours. A listing of all 49 trees (including a tree viewer) is available at 
http : //www. cs .ualberta. ca/~hayward/hex7trees. 



5 The program was compiled with gcc 3.1.1 and run on an AMD Athlon 1 800+ MHz processor with 512 MB 
memory running Slackware Linux. 




272 



R. Hayward , Y. Bjornsson, M. Johanson, M. Kan, N. Po, J. van Rijswijck 




Figure 10. All 7x7 1-opening results, as found by Solver. The stone on each cell indicates 
the winner with perfect play if White’s first move is to that cell. The move indicated on each 
losing cell is the winning countermove discovered. 

7 f 
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Figure 11. Number of nodes in the Solver 7x 7 1-opening recursion trees. 

For any size Hex board, the set of winning open-move cell locations is sym- 
metric with respect to reflection through the center of the board. Notice that the 
Solver node-counts do not share this symmetry, as neither the order in which 
Solver considers moves nor the static computation of virtual connections is 
designed to reflect this symmetry. 

Figure 12 demonstrates the relative strength of the three key parts of our 
algorithm, namely virtual connection computation, side-triangle move domi- 
nation, and side-triangle fill-in, by showing Solver node-counts when various 
of these features are turned off. In particular, notice that adding side-triangle 
fill-in to virtual connection computation results in a substantial decrease in the 
number of nodes considered, while further adding side-triangle domination has 
little effect. 
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Figure 12. Number of nodes in the 6x6 1 -opening recursion trees for Solver (top entry), 
Solver— D, namely Solver without side-cell domination (middle entry), and Solver— FD, 
namely Solver with neither side-cell domination nor fill-in (bottom entry). While correspond- 
ing data were obtained for some 7x7 1 -openings, Solver— FD in particular was too slow 
to execute for all such openings. For example, the b7 Solver— FD tree has 824796 nodes, 
compared to only 1 196 for Solver. 



In comparing the winning 7x7 opening moves (Figure 10) with winning 
opening moves on smaller boards (Figure 2), some features common to each of 
these n x n boards are worth noting. For example, 

■ the n cells on the short diagonal (obtuse comer to obtuse comer) are all 
first-player winning openings, 

■ the n — 1 cells on each of the first-player’s sides (except for the cell in 
the short diagonal) are all first-player losing openings. 

It would be of considerable interest to show whether these results hold in general, 
especially if the proof is positive (as opposed to say a single counterexample), 
since to date, for arbitrarily large n x n boards, 

■ no particular move is known to be a first-player win, 

■ the only moves which are known to be first-player losses are 

- for n > 2, the two acute comer cells (Beck, 1969) 

- for n > 3, the two cells each in the first-player’s side and adjacent 
to the acute comer cell (Beck, 2000). 
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Figure 13. The Solver recursion tree for the 7x7 opening White-b6 (with the ten nodes 
connected by dotted edges added so that every path ends with a winning move). For each node, 
the order of child generation is top-to-bottom. Each Solver recursion tree is a subtree of the 
complete game-tree, as the only replies to a winning move which appear in the recursion tree 
are those replies in that state’s mustplay region. For example, consider for the tree shown here 
the state G after White plays b6. As shown in the second diagram in Figure 7, White has a win- 
prelink created by playing at c4 which does not contain d4; thus d4 is not in the black mustplay 
region for G, so Solver never needs to consider the black move to c4, so c4 does not appear as a 
child of b6 in this recursion tree. Notice from the tree shown here that in solving the b6 opening 
the selection of d2 as the first move considered at the b6-c5-c3-c2 subtree was unfortunate, as 
d2 leads to a white loss whereas f2, the second move considered, leads to a white win. If f2 had 
been considered first, the d2 subtree would not have been explored, and the resulting recursion 
tree would have had only 97 nodes instead of 197. 
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g4 f2 e3 e2 



Figure 14. The Solver recursion tree for the 7x7 opening White-fl (with the five nodes 
connected by dotted edges added so that every path ends with a winning move). For each node, 
the order of child generation is top-to-bottom. Notice that the fl-b6 subtree, which establishes 
that b6 is a winning countermove to fl, is paradoxically smaller than the b6 subtree shown in 
Figure 13, in part because of the move ordering here is more fortunate than there. In this fl-tree, 
whenever it is White’s turn to play, the first move considered turns out to be a winning move; 
this is not the case in the b6 tree shown in Figure 13. 
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123456789 

al c4 a6 f2 d3 b6 e4 d4 e3 e5 d5 el e2 fl a3 b2 c2 c3 
a2 e3 f4 d5 c5 d3 g2 e6 e5 c7 d6 d7 g5 f7 e7 f6 b6 d2 
a3 d4 c5 b4 c4 d2 c3 c2 b3 bl b2 cl e4 f2 e2 d5 d3 b7 

a4 b3 d3 d5 c5 d2 c2 b7 a7 b6 a6 b5 e5 c3 a5 bl a3 b2 

a5 b6 c5 e3 f4 d5 c6 c3 d3 el e2 fl gl f2 d2 dl g2 f3 

a6 fl c3 b5 c6 d4 c4 c5 a5 b4 a4 b3 a3 a7 b6 bl b2 cl 

a7 d5 c3 d2 c2 c4 a5 b4 a4 a6 e4 d4 e5 e3 b5 b3 a3 bl 

bl c4 a6 f 2 d3 b6 e4 d4 e3 e5 d5 el e2 fl a3 b2 c2 c3 

b2 e3 g2 gl a6 b3 c3 d2 f2 fl el e2 c2 b5 d4 c4 d3 d5 

b3 bl c4 c3 b6 fl b4 c2 b2 cl d2 d3 e3 d4 e4 e2 gl d5 

b4 fl c2 b7 a7 c3 b3 b5 c4 c5 e4 d4 e3 e5 d5 c7 b6 c6 

b5 fl c3 b6 c6 d2 c5 cl a2 b4 c4 b2 a3 b3 c2 dl e2 d3 

b6 c4 f 2 d5 d4 c5 e5 e4 g3 f3 g2 f4 g4 

b7 b6 e4 d5 e5 e3 g2 f3 g3 e6 f5 f6 c6 c5 d6 f4 d4 d3 

cl c4 a6 f 2 d3 b6 e4 d4 e3 e5 d5 el d2 c2 a3 dl b4 a4 

c2 d5 c5 b7 a7 b6 a6 b5 a5 b3 c4 fl b4 c3 e2 d3 e3 d4 

c3 d4 b6 b5 c5 cl c4 d2 a2 b2 a3 b3 a4 b4 c2 dl e2 d3 

c4 c3 b6 d3 e3 fl d2 e2 b3 b5 b4 c5 d4 d5 e5 e4 g3 f3 

c5 d5 c2 b7 a7 b6 a6 b5 a5 b3 c4 fl b4 c3 e2 d3 e3 d4 

c6 d3 c3 c4 e3 d5 c5 d4 e5 e4 a5 b4 a4 b3 a3 bl b2 b5 

c7 b6 e4 d5 e5 e3 g2 f3 g3 e6 f5 f6 c6 c5 d6 f4 d4 d3 

dl b6 c6 d4 e4 d5 e5 f2 e3 e2 

d2 c5 e4 d3 f2 e5 d5 f3 e3 c7 b6 c6 b5 c3 c4 d4 a4 b2 

d3 d4 b6 c4 a5 a6 b5 b3 c3 b4 e4 d5 e5 e3 g2 f3 g3 gl 

d4 d3 c4 c3 e3 fl c2 d2 el e2 gl f2 f3 g2 a4 b3 a3 bl 

d5 d4 f2 e4 g3 f3 g2 f5 e5 f4 c4 c5 a6 a7 b6 b7 c6 d3 

d6 e3 c4 d5 b6 c3 d3 b5 c5 el f2 e2 f3 e5 d4 e4 g4 f6 

d7 b6 e4 d5 e5 e3 g2 f3 g3 e6 f5 f6 c6 c5 d6 f4 d4 d3 

el b6 c6 d4 e4 d5 e5 f2 e3 e2 

e2 d5 e5 e4 c5 d3 e3 d4 c3 c4 g3 f4 g4 f3 g2 f2 gl f5 

e3 d5 b6 c4 c5 d4 b3 b4 e5 e4 g3 f2 f3 e2 d3 d2 c3 cl 

e4 e3 d4 d3 b3 b4 c3 c4 g2 gl f2 fl e2 el d2 b2 c2 d7 

e5 d5 e4 e3 d4 d3 b3 c2 b2 c4 a5 a4 b4 b6 c5 b5 g2 gl 

e6 f2 e3 d4 e2 e4 f3 f4 c4 c5 a6 a7 b6 b7 c6 d3 c3 b5 

e7 d3 a5 b3 e3 d4 e4 fl e2 el d2 dl c2 cl a2 b2 gl f2 

fl b6 e3 e5 d5 c7 b7 c6 d6 d7 

f2 e4 b6 d3 d4 e3 c3 c4 a5 a6 b5 b4 a4 

f3 d4 e5 f4 e4 f2 e2 e3 c4 c5 a6 a7 b6 b7 c6 d3 c3 b5 

f4 c5 f 5 d3 e3 fl gl f3 d4 e4 c4 d5 a6 a7 b6 b5 a5 b7 

f 5 d4 f2 f3 e3 e4 c4 c5 a6 a7 b6 b7 c6 d3 c3 b5 a5 c7 

f6 c5 a6 a7 g2 f5 e5 d6 b6 b7 c6 d5 c4 c7 e6 f3 e4 e3 

f7 d3 a5 b3 e3 d4 e4 fl e2 el d2 dl c2 cl a2 b2 gl f2 

gl b7 e5 e4 f3 d6 g4 f2 e3 f4 g3 e7 g6 g5 f5 f6 d7 e6 

g2 b7 e5 f3 e2 d3 e3 e4 d4 d5 a7 c5 g3 f4 g4 f2 gl f5 

g3 f 2 e3 c5 d4 c2 a2 b3 e2 d5 e5 e4 f3 f4 g4 f5 g5 f7 

g4 e7 c5 d3 c3 b6 c4 c7 c6 cl b7 d2 a2 b2 a3 b3 a4 b4 

g5 d4 c4 d3 e3 e4 c3 c5 a6 b3 b5 a7 b6 b7 c6 c7 d6 d7 

g6 c5 g2 f 5 c4 c3 e5 f3 f4 g3 b4 d3 a3 d4 a6 a5 b5 a7 

g7 d3 c3 c5 c4 e4 d5 d4 a6 a7 b6 b5 a5 b3 b4 b7 c6 c7 



10 11 12 13 14 15 16 17 18 19 

a4 d2 b5 a2 b3 a5 b4 a7 c5 c7 b7 c6 d6 d7 

c6 gl f2 fl el e2 a4 b3 

a7 b6 a6 b5 e3 e5 c6 c7 d6 d7 

e4 fl e2 el gl f2 g2 f3 g3 f4 g4 f5 g5 f7 f6 e7 e6 d7 c4 

g3 e6 d6 e5 g5 f7 e7 f6 d4 e4 

c2 dl el d2 e2 d3 e4 d5 e5 e3 g2 f2 gl f3 g3 f4 g4 

b2 b6 c6 c5 g2 f3 g3 gl f2 fl e2 d3 el f4 g4 

a4 d2 b5 a2 b3 a5 b4 a7 c5 c7 b7 c6 d6 d7 

c5 b7 a7 b6 a5 b4 e4 e5 c6 c7 d6 d7 

e5 f 2 g2 f3 g3 f4 g4 

d6 d7 e6 e7 g6 f6 g5 f5 g4 f4 g3 

e3 d4 e4 el gl d5 e5 f2 g2 f3 g3 f4 g4 

g4 gl c4 c3 f 2 fl e2 el d2 dl 

b3 bl e2 b2 fl cl b7 c6 a7 b5 a5 c3 d6 d7 

e5 e4 g3 f3 g2 gl f2 d2 el f4 g4 

e4 d5 e5 e3 g2 f2 gl f3 g3 f4 g4 

g2 gl f 2 b2 c2 f4 g4 

e5 e4 g3 f3 g2 gl f2 d2 el f4 g4 

a6 a7 b6 cl c2 dl d2 el e2 fl gl f2 g2 f3 g3 f4 g4 

g4 gl c4 c3 f 2 fl e2 el d2 dl 

b3 c2 d6 d7 

f2 fl e2 el c2 f4 g4 

b2 b4 a5 

c3 b5 a5 c7 e6 b4 a4 

e6 f 5 d2 dl 

g4 gl c4 c3 f 2 fl e2 el d2 dl 

g5 f7 f6 e7 e6 d7 c6 d6 a5 b5 a6 a7 b6 b4 a4 

a2 a3 b2 f4 g4 

c6 f3 g3 f4 g4 

f2 fl e2 el d2 c3 dl f3 g3 

a5 c7 d6 b4 a4 

g2 f3 g3 f4 g4 f6 f5 e6 e5 d6 c5 b4 



a5 c7 e6 d5 d6 b4 a4 

c6 c7 d6 d7 f6 e5 e6 b4 a4 

d6 d7 f6 e5 e6 b4 a4 

d4 c3 d3 el e2 fl gl f2 f4 g3 d2 dl 

g2 f3 g3 f4 g4 f6 f5 e6 e5 d6 c5 b4 

c6 d5 c5 c7 a7 g2 e2 d4 c4 b6 a6 d3 c3 b5 a5 b4 a4 

g5 f7 f6 e7 e6 d7 d6 c7 b6 c6 a5 b4 a4 b5 a6 

f6 e7 e6 d7 d6 c7 

a5 b5 c2 dl e3 d4 e4 d5 e5 fl el e2 gl f2 g2 f3 g3 d6 e6 

e6 e7 g6 f5 f6 dl c2 cl a2 b2 d2 el 

b6 bl b2 a4 b3 b7 c6 cl c2 dl 

d6 d7 e6 e7 f6 f7 g6 dl c2 cl a2 b2 d2 el 



Figure 15. Longest 7x7 Solver lines of play. For each of the 49 7x7 1-openings, the 
corresponding line shows a longest line of play from the the associated Solver solution. The 
top row shows the move number of that column. 
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6. Conclusions and Open Problems 

We have shown how combining static and dynamic virtual connection com- 
putation methods with some move domination results yields an algorithm strong 
enough to solve arbitrary 7x7 Hex states. A next step is to design an algorithm 
strong enough to solve 8x8 states; preliminary results suggest that this is con- 
siderably more difficult and that further techniques will be required. Another 
direction is to use Solver to gather 7x7 information which can be used to 
find better move ordering heuristics for Hex game-tree search on (much) larger 
boards; for example, such data would be useful in analyzing any local config- 
uration with effective board size at most 7x7. 
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Abstract This paper proposes a general and automated method that generates accurate eval- 

uation functions, without expert players’ knowledge of a target game. Patterns 
(which are partial descriptions of a game state) are widely used as primitives of 
evaluation functions in game programming. They have to be carefully selected in 
order to generate accurate evaluation functions. Our approach consists of three 
steps: (1) generation of logic formulae by using the specifications of a target 
game, (2) translation of the formulae into patterns, and (3) selection of a set of 
suitable patterns from those generated. The problem, in the automated identi- 
fication of suitable patterns, is that it is difficult either to generate only useful 
patterns or to examine all possible patterns. The latter obstacle is due to the 
prohibitive numbers involved. We solved this dilemma by a combination of two 
methods, where one method generates patterns of good quality, and the other 
method entails a lightweight selection based on statistics that could handle a 
large number of candidates. Experiments in Othello revealed that about 100,000 
patterns from more than eight million automatically generated patterns could be 
successfully selected with our method, and that accurate evaluation functions 
were constructed. This accuracy is comparable to that of specialized Othello 
programs and is much better than that of the evaluation functions generated by 
existing general methods. 

Keywords: feature generation, feature selection, evaluation function, Othello 

1. General Game Players 

One of the most ambitious goals of artificial-intelligence research is the 
development of a general game player that can learn and play an arbitrary 
instance of a certain class of game. Strong game programs must have an accurate 
and efficient evaluation function that can estimate the results of a game based 
on the notion position. Since an evaluation function is specific to a target game, 
the development of general game players requires evaluation functions to be 
automatically constructed without assistance of human experts. 
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1.1 Learning of Evaluation Functions 

A popular way of constructing an evaluation function is to make it a (linear) 
combination of evaluation primitives called features, and adjust the parameters 
of the combination (Samuel, 1967; Tesauro, 1992; Buro, 2002). Generally, the 
construction of evaluation functions requires the acquisition of features, and 
the training of a prediction model (e.g., linear combination). 

1.2 Learning of Features 

The main difficulty in constructing evaluation functions is identifying ap- 
propriate features. In most preceding investigations, these features have been 
provided by human experts for the game involved. 

Our first goal is to identify appropriate features mechanically. To achieve 
this we employed a method of constructing features written in logic programs 
(we called them logical features). However, logical features are not practical 
because they are too slow in evaluating logic programs. Yet, the advantage 
is that practical evaluation functions were constructed with a large number of 
patterns as features (Buro, 1998; Buro, 2002). A pattern is a logical formula in 
a specific form. We introduce a rigorous definition for this in Subsection 3.3. 
Even though a pattern is just a logical formula in a specific form, the mechanical 
identification of suitable pattern sets to derive a good evaluation function is a 
difficult task. 

1.3 The Approach 

Our second goal is to construct efficient and accurate evaluation functions 
through game-independent methods. Here we propose a combination of meth- 
ods that yields patterns similar to Buro’s (1998) methods by translation from 
logical features. These methods are: 

1 generation of logical features, 

2 extraction of patterns from logical features, and 

3 selection of suitable patterns. 

A large number of patterns are produced in steps 1 and 2, and useful patterns 
are selected in step 3. The claim of the paper is that this selection is indispensable 
for generating useful evaluation functions. The reason why we have to generate 
such a large number of patterns in steps 1 and 2 is that they are required to achieve 
accuracy in the evaluation functions constructed. There is no known method of 
generating only useful patterns. 

The method of selection must be so lightweight that a machine can evaluate 
numerous pattern candidates within practical time limitations. We demonstrate 
the effectiveness of our solution through experiments. 
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The paper is organized as follows. Section 2 reviews related work and other 
issues that need to be resolved to construct general game players. Section 3 
introduces the basic terminology. Methods to generate logical features and 
evaluate positions are briefly explained in Sections 4 and 5. In Section 6 a 
method of selection is proposed. Section 7 shows the experimental results in 
Othello. Section 8 concludes the paper. 

2. Related Work 

The construction of general game players requires the acquisition of game- 
specific search enhancements as well as evaluation functions, such as realization 
probabilities (Tsuruoka, Yokoyama, and Chikayama, 2002), opening books 
(Lincke, 2001), and endgame books. This paper only addresses evaluation 
functions, even though we are aware that our method can be applied to the 
acquisition of other knowledge. 

In constructing evaluation functions, the training of prediction models re- 
quires unbiased training positions and an appropriate labeling (Buro, 1998). It 
is well known that the usefulness of learned evaluation functions depends on 
the training positions used. Thus, unbiased positions are needed to develop 
strong programs. Because this paper primarily focuses on the acquisition of 
features, the experiments were conducted on a game where both the training 
positions and the labeling were available (near endgame in Othello). In games 
where these are not available, we can apply methods of gathering positions via 
self-play and temporal-difference learning (Tesauro, 1992, 2002). 

We simply use linear regression for prediction because we could use a method 
that iteratively adjusts the weights in a linear model, even when a very large 
number of features are used (Barrett et al., 1994). Other prediction models, 
such as neural networks, could be used with our method, too. 

Logical features are general and were actually applied to many games. We 
mention Othello and a single-agent search problem by Fawcett (1993), symmet- 
ric chess-like games (Pell, 1993) and a variant of Shogi (Kaneko, Yamaguchi, 
and Kawai, 2002). However, the cost of evaluating positions is prohibitive when 
there are logical features due to the slow evaluation of logic programs, despite 
the recent efforts that have increased speeds more than 4,000 times (Kaneko, 
Yamaguchi, and Kawai, 2000, 2001). 

Buro (2002) used patterns in fixed shapes. This is effective in achieving 
highly efficient pattern matching, even when a large number of patterns is 
involved. However, there is no established method of identifying effective 
shapes mechanically, and we do not know whether patterns in such fixed shapes 
are useful in other games. Kojima, Ueda, and Nagano’s (1997) method acquired 
patterns from game records in Go through genetic programming. This requires 
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owns(d5, x). owns(e4, x). 
owns(d4, o). owns(e5, o). 

blank(al). blank(a2). ... 




owns(c4, x), owns(d4, x). 
owns(d5, x). owns(e4, x). 

owns(e5, o). 
blank(al). blank(a2). ... 



Figure 1. Othello initial position (left) and position after Black has played c4 (right). Facts 
below each board define the position shown above. 

game-specific adaptation to apply it to other games because it depends on the 
importance of adjacent stones. 

We recently developed a method of generating patterns from logical features 
(Kaneko et al., 2001). However, the accuracies of the generated evaluation 
functions did not reach those that Buro obtained. This is because we only used 
about 4,000 patterns, while Buro used about 200,000. For our method it was 
impossible to provide a sufficient number of useful patterns because effective 
methods of selection were up to then unknown. 

The selection of features is a central research topic in artificial intelligence, 
and many methods have been developed (Guyon and Elisseeff, 2003 ; Jain, Duin, 
and Mao, 2000). It is a combinatorial optimization problem. Heuristics are 
essential because the computational costs identifying an optimal pattern subset 
are known to be exponential in terms of the number of candidates (Jain, Duin, 
and Mao, 2000). Such costs are not acceptable. Moreover, popular selection 
methods such as the F-test in statistics cannot be used here. To illustrate this 
difficulty, we used about eight million candidates in the experiments that will 
be described later. Obviously, their covariances cannot be stored on normal 
computers. 



3. Basic Terminology 

This section introduces the basic terminology, including the specifications 
of a game written in logic (Subsection 3.1), the logic features (Subsection 3.2) 
and the definition of patterns (Subsection 3.3). 

3.1 Positions and Domain Theory 

A position is an intermediate status of a game. It is described by a set of 
special facts. A fact is a clause without a body. In Othello, owns and blank 
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legal_move(S, Player) : -square (S) , bs (S,_End, Player ) . 
bs(Sl,S3,P) : -blank (SI) , opponent (P,Opp) , 
neighbour (SI ,D,S2) , span(S2,S3,D,0pp) , 
neighbour (S3 , D , S4) , owns (P , S4) . 
span (SI, S2,D, Owner) :- 

square (SI) , square (S2) , player (Owner) , owns(0wner, SI), 
neighbour (SI ,D,S3) , span (S3, S2,D, Owner) . 
span (S,S,D, Owner) :- 

square (S) , player (Owner) , owns (Owner , S) , direction(D) . 
line(S,S,D) :-square(S) ,direction(D) . 

line (From, To, D) : -neighbour (From, D, Next) , line (Next , To ,D) . 
opponent (x, o) . opponent (o, x) . 

direction(n) . direction (ne) . direction(e) . direction(se) . 
direction(s) . direction(sw) . direction (w) . direction(nw) . 
square (al). square (a2). square (a3) . (•••) 

square (d2). square (d3). square (d4). (•••) 

neighbour (al , s, a2) . neighbour (a2 , n, al) . 
neighbour (a2, s, a3) . neighbour (a3, n, a2) . (•••) 

neighbour (c4, ne, d3) . neighbour (d3,sw,c4) . (•••) 

Figure 2. Sample domain theory for Othello. 

are used to represent a position. To demonstrate this, we have shown the facts 
defined in the initial position in Othello and the position after Black has played 
c4 in Figure 1. Here, Black is denoted by x, and White is denoted by o. In the 
initial position, owns(d5,x), owns(e4,x), owns(d4,o), and owns(e5,o) 
are defined for squares with a disc, and blank is defined for each empty square. 

The main part of the specifications of a game consists of the rules of the 
game and the goal conditions. This is called domain theory and described by 
a set of Horn Clauses. The example Othello domain theory in Figure 2 is used 
throughout this paper. 

3.2 Logical Features 

Logical features are defined as Horn Clauses of the predicate logic where 
predicates in their body are defined by domain theory or position. The following 
clause is an example of a logical feature. 1 

f (A) : -owns (x , A) . % pieces for Black 



1 This is written as “f(N) count([A], (owns(x,A)), N)” in Fawcett (1993). In this paper, “count” has been 
assumed to be the default semantics of logical features and has therefore been omitted. 
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The value of a logical feature for a state is defined as the number of solutions, 
where solutions are the bindings of such constants to variables that make the 
clause true. In the above feature, A is a variable, and the solutions in the initial 
position in Figure 1 (left) are d5 and e4 (two solutions), which is the number 
of squares currently owned by Black. 

3.3 Patterns 

A pattern is defined as a conjunction of facts describing a part of a position. 
The value of a pattern is 0 or 1 according to its Boolean value; in a given 
position this value of a fact is 1 if it is defined (or 0 if undefined). For example, 
the following is a pattern. 

blank (al) A owns(x,a2) A owns(o,a3) 

This pattern is a logical formula for “White can play on square al.” 

4. Pattern Generation 

Patterns are generated through the following steps: 

1 generation of logical features with Fawcett’s (1993) method, 

2 translation of logical features into propositional logic by unfolding, and 

3 extraction of patterns from propositional logic. 

First, logical features are generated by means of syntactic translation of 
Horn Clauses, which are extracted from the domain theory of a target game. 
For example, the following feature (called a mobility feature) can be generated. 

f (A) : -legal _move (A , o) . % mobility for White 

Complex features can be generated by taking the preconditions of existing 
features. Fawcett (1993) has more details on automated construction. 

In the next step, generated features are translated into propositional logic 
by unfolding. This is a technique in partial evaluation of logic programming 
(Bossi, Cocco, and Dullie, 1990), and is repeatedly applied until features only 
consist of ground facts. In conventional games with reasonable rules, it is 
easy to write a domain theory so that the unfolding of generated features stops 
even if they contain recursively defined clauses, due to the finiteness of the 
number of squares and satisfiable terms. Detailed translation methods have 
been described by Kaneko et al. (2001). The following clauses are part of the 
results we obtained for the unfolding of the feature in the above example. 

legal jnove (al ,o) blank (al), owns(x,a2), owns(o,a3). 

legal_move (al ,o) blank (al), owns(x,bl), owns (o, cl), 

legal-move (al ,o) blank(al) , owns(x,b2), owns(o,c3). 
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Finally, we extracted patterns from the unfolded features simply by taking 
the conjunctive part of their propositional formulae. The following formulae 
are patterns extracted from the unfolded features listed above. 

■ blank(al) Aowns(x,a2) Aowns(o,a3) 

■ blank(al) Aowns(x,bl) Aowns(o,cl) 

■ blank(al) Aowns(x,b2) Aowns(o,c3) 

Each pattern has a corresponding clause whose body (right hand of clause) 
is equivalent to the pattern. 

5. Pattern Matching 

Below, we briefly discuss a pattern matching method to justify the selection 
method of the next section. The purpose of the selection is to identify sets of 
patterns that produce efficient and accurate evaluation functions, where their 
efficiency depends on how the patterns are evaluated. Basic ideas in efficient 
matching are (1) performing incremental calculations and (2) utilizing a partial 
order on patterns. 

5.1 Incremental Matching with a Diagram 

Incremental matching was efficiently implemented with a Hasse diagram 
(Gries and Schneider, 1993) on the partial order of patterns, as outlined in Figure 
3. Let each a, b, and c be a fact describing a position (such as blank (al ) ), and 
consider that there are six patterns {abc, ab, be, a, b , and c}. Here, ab means 
the conjunction of a and b. In the figure, a pattern is denoted by a square, and 
a fact is denoted by a circle. For each pattern, the question whether matching 
is required can quickly be determined by using the diagram. For example, 
matching of pattern ‘ abc ’ is only required when the value of pattern ‘ ab ’ or ‘ be ’ 
changes. 




Figure 3. Hasse diagram of sample patterns. 



The computational costs of incremental matching can be estimated by the 
number of nodes visited. Because each edge will be visited once at most, the 
cost for the worst case is proportional to the number of edges. Cube extraction 
(Rudell, 1996) was applied to a diagram here to reduce edges, as well as other 
optimizations. Details are discussed in Kaneko et al. (2001). 
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5.2 Counters for Matching 

To speed up matching of individual patterns, an integer counter cur(p ) was 
associated with each pattern p such that the matching was determined by integer 
comparison instead of naively computing the logical conjunction of each fact 
in the pattern. 

Let dep(p) ( upd(p )) be children (parents) of pattern p in a diagram. Counter 
cur(p ) is defined as the number of children of p whose current value is true. 
Then, as long as cur(p) is properly maintained, the Boolean value of cur(p ) = 
\dep(p) \ coincides with the value of pattern p. 

6. Pattern Selection 

This section introduces a lightweight selection method, which consists of 
two methods that are computationally inexpensive. These are: 

■ preliminary filtering by using the frequency of patterns, and 

■ approximated forward selection by assessing the contribution of patterns 
to the accuracy of a prediction model. 

The latter method takes into account the accuracy of a linear model that uses 
selected patterns. Consequently, it requires that the model is trained (by weight 
fitting) for each subset of patterns; thus the method is relatively expensive. The 
former method is more efficient because it only uses the frequency of each 
pattern. However, it cannot be used to select useful patterns by itself. Hence, 
we first need to filter the candidates with the former method, and then select 
useful patterns with the latter method, to reduce its weight-fitting time. 

6.1 Preliminary Filtering by Frequency 

First, useless patterns are heuristically determined and filtered out by analysing 
their frequency, before approximated forward selection is done in the next step. 
There are two background considerations: (1) if low-frequency patterns are 
used, the efficiency of evaluating positions by using the method detailed in Sec- 
tion 5 will improve, 2 and (2) the use of extremely low-frequency patterns tends 
to cause over-fitting. We claim that high- or extremely low-frequency patterns 
can safely be rejected without a loss of quality in the generated evaluation func- 
tions. Although this may seem similar to the filtering in existing work (e.g., 



2 More precisely, it is better to measure the frequency at which the patterns change from one position to 
another to improve search efficiency. We used the frequency of the patterns themselves, because this could 
be measured more easily. Moreover, to reduce the computational costs of weight fitting, reducing the 
frequency of the patterns themselves is also essential, as discussed in Subsection 7.2.2. 
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Figure 4. 



Frequency of matching positions 

Histogram for frequency of matching positions. 



Kojima et al., 1997; Buro, 1998), our approach is different in the sense that our 
colleagues do not explicitly reject high-frequency patterns as we do. 3 

We measured the frequency of part of Buro’s (2002) horizontal patterns to 
estimate an appropriate frequency range (Figure 4). Because the highest fre- 
quency in Buro’s patterns was 0.075, we expected that good evaluation functions 
could be generated with only patterns with a frequency below this value. Figure 
4 also has the results of measurement for our patterns. It can be seen that many 
patterns can be filtered out by frequency. The preferable frequency ranges were 
determined by the experiments, which are discussed in Section 7. 



6.2 Approximated Forward Selection 

After filtering, we applied a method of statistically selecting explanatory 
variables, by treating a pattern as a binary variable. Approximated sequential 
forward selection was adopted from many existing methods (Guyon and Elis- 
seeff, 2003). It is so efficient that it was used already for manual computation 
before computers became widely available (Okuno et al., 1981). 

The algorithm is listed in Figure 5. It is used to select a subset of variables 
(. S ) that are effective in predicting a target variable (yo), from a set of candidates 
(i.e., patterns, X). A target variable is the difference between the number of 
black and white discs (explained below in the experiments). 

One pattern (x ai ) is added to the selected set (5) at the seventh line for each 
loop, as in sequential forward selection. A priority function, also explained 
later, is used to select a pattern. Let n be the number of candidates and m be the 
number of patterns finally selected. Because variables in S are never removed, 
the method tries m subsets of candidates, which is far less than the possible 
number of subsets, 2 n . 



3 Kojima et al.’s (1997) method and the inductive algorithm proposed by Buro (1998), which was not used in 
preparing the evaluation functions for Logistello, tend to discard high-frequency patterns because they 
prefer specific patterns in matching. As patterns for specific given shapes contained at least eight squares, 
high-frequency patterns were not used in constructing evaluation functions for Logistello. 
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// (input) yo : a target variable, 

// X — {xo, x\ , . . . , x p }: a set of explanatory variables 

// (output) S : a set of selected variables 
// residuals after selection 

i <- 0, 5 <- 0 

while (termination criterion is not satisfied) 

pick . of the highest priority 

compute a Q ., b ai by univariate regression s.t. a ai x ai + b ai predicts yi 

Vi+ 1 2/i — + Ki) H residuals 

S<-SU{x ai } 

i i + 1 

Figure 5. Approximated forward selection algorithm. 

A priority function is used to estimate the usefulness of the pattern for selec- 
tion at the seventh line, and this should be carefully adopted taking the purpose 
of selection into consideration. For practical game programming, efficiency 
and accuracy should be taken into account to estimate the usefulness of a pat- 
tern in terms of priority. In this paper, we used the correlation with residuals 
after the i-th regression yi as a priority function to achieve accuracy. 

Here, if explanatory variables have no correlation with one another, variables 
selected with this method are equivalent to the ones selected by normal sequen- 
tial forward selection, where the multiple regression coefficient in predicting yo 
using all variables in S is used as the priority function. 4 The order of candidates 
affects the results in other cases (Okuno et al., 1981). However, this method 
is more efficient than sequential forward selection because it uses univariate 
regression instead of multivariate regression. 

// (input) y' 0 : be a target variable 
// Aq, X\ , . . . X n : sets of variables 

// (output) R: selected variables 
R^-Q) 

for each i in 0, . . . , n 

(5, y'i+i) approximated forward selection(y', 2Q) 

R+-RUS 

Figure 6. Iterative selection algorithm. 

The improved computation applied so far leads to appropriate results. Yet, 
the most expensive computation is to determine the priority (i.e., correlation) of 
each pattern in each loop. Naively, it requires pattern matching over all training 
positions for every loop, but then the computational costs are unacceptable. The 



4 The method approximates sequential forward selection by using the accumulation of univariate regressions 
instead of multivariate regression. 
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priority of each pattern can be incrementally updated by means of a table holding 
the number of pattern co-occurrences if there are not too many candidates. 

Thus, to avoid frequent pattern matching, patterns were split into sets of a 
moderate number of patterns {Xo, Xi, . . . X n } in advance, and approximated 
forward selection was iteratively applied to each Xi in turn, as shown in Figure 

6. A test to determine whether the priority of a selected pattern went beyond 
a given threshold worked well as a termination criterion in each approximated 
forward selection. We selected variables from Xo up to a given threshold, and 
then selected variables from X\ up to the given threshold. This step was repeated 
to X n . Preferable priority thresholds were estimated in the experiments and are 
discussed in Section 7. There were 1,000 candidates (X^) in each approximated 
forward selection in our experiments. Although accuracy improves with greater 
numbers, only slight improvements could be observed for 4,000 candidates in 
our experiments. 

7. Experimental Results 

We did experiments on Othello to prove the effectiveness of the generation 
and selection methods proposed. We compared evaluation functions generated 
by our methods with those generated by other general methods, and with the 
evaluation functions used in specialized Othello programs. We used a computer 
with an Athlon MP 2100+ CPU (1.7 GHz) for these experiments. The program 
was implemented in GNU C++. 

7.1 Pattern Generation and Selection 

First, 11,079 logical features were generated by Fawcett’s (1993) method. 
Subsequently, 8,502,664 unique patterns were extracted from the logical fea- 
tures with the method proposed in Section 4. We then did selection by frequency 
as described in Subsection 6.1. Several sets of patterns were selected with var- 
ious frequency ranges. Finally, we applied the iterative selection described in 
Subsection 6.2 to the resulting sets with various priority thresholds. The priority 
function used here was correlation, and candidates were sorted by frequency. 

7.2 Accuracy of Evaluation Functions 

This subsection contains the heart of our experimental research. It is subdi- 
vided into six sub-subsections, each of them dealing with a relevant item. 

7.2.1 Training Positions and Labeling. Evaluation functions made 
up of selected patterns were constructed to enable the usefulness of patterns 
to be estimated. Each of the functions was a linear model of patterns. The 
weights were adjusted by means of least mean squares to predict the final score 
(difference between number of black and white discs at the end of the game 
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after both players had played the best moves). We separately constructed the 
evaluation functions for the positions of 60 discs and those for 55 discs. We 
only used the positions of 60 and 55 discs because positions of the near the 
endgame can be immediately labeled with the results of a complete search. 

The positions we used in selection and training were extracted from games 
played between Logistello and Kitty . 5 It should be noted that our pro- 
posed method works without the game records of strong game programs. The 
purpose of using positions taken directly from games is to gather unbiased po- 
sitions and to demonstrate the method’s learning ability in positions that strong 
programs face. About 50, 000 positions were selected by eliminating duplicate 
positions considering the symmetry of the geometry and players. We then gen- 
erated two disjoint sets of positions expanding the symmetric ones. 6 One set 
contained about 800, 000 positions for training and the other had about 6, 000 
positions for testing. 

7.2.2 Adjustment of Weights. Weights in evaluation functions with 
fewer than 10,000 patterns were adjusted with LAPACK 7 , and an iterative 
method (BiCGSTAB 8 )(Barrett et al., 1994)) was used instead, due to memory 
limitations, in other cases. The time for weight fitting depends on the effi- 
ciency of an evaluation function and on the number of matching patterns for 
a position on average. This efficiency was primarily important because the 
iterative method requires pattern matching over all training positions for many 
repetitions. The number of multiplications required for each position is about 
the number of matching patterns squared. Thus, it was not feasible to use all 
patterns generated without selection. The time for weight fitting tended to be 
more than a week if there were more than 100,000 patterns. Buro could use 
more patterns because his efficiency is much better, as will be described below, 
and because only 50 patterns at most should match each position due to the 
carefully crafted shapes. 

7.2.3 Accuracy of Proposed Evaluation Functions. The graph in Fig- 
ure 7 illustrates the accuracy of our evaluation functions and the others. Here, 
“error” in the vertical axis is the square root of mean square errors. The hori- 
zontal axis plots the number of patterns on a logarithmic scale. Our evaluation 
functions (“with selection”) for positions with 60 discs are denoted by the ’+’, 
and those for positions with 55 discs are denoted by the The errors for 55 



5 Both are available at ftp : / / external . nj . nec . com/pub/ igord/IOS/misc/. 

6 To generate evaluation functions that yield the same value at symmetric positions, symmetric patterns should 
have the same weight in the evaluation functions. We achieved this by simply instantiating all symmetric 
positions when adjusting weights. 

7 http : //www . netlib . org/lapack/ 

8 http : // netlib2 . cs . utk . edu/linalg/html_templates/Templates . html 
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discs are larger than those for 60 discs, because it is more difficult to predict 
scores for the positions of earlier game stages. The frequency ranges used in 
filtering were [1.25 • 10 -5 , 0.075] ; 540,724 patterns were selected. The priority 
thresholds used were 0.000125, 0.0005, 0.00125, 0.005, and 0.01; we selected 
approximately 3,000 to 140,000 patterns. The accuracy of our evaluation func- 
tions improved as the number of patterns increased. 

7.2.4 Comparison with Logical Features or All Patterns. We com- 
pared our evaluation functions with those using logical features and those using 
all patterns without selection to demonstrate improvements over existing gen- 
eral methods. 

We have already reported on a comparison of all patterns and logical features 
(Kaneko et al., 2001). The accuracy of evaluation functions using 18 logical 
features that Fawcett had selected was 12.9 and 12.5, and the accuracy of 
evaluation functions with 42 logical features that were statistically significant 
and selected with an F-test from 10,000 features was 8.90 and 12.4 for positions 
with 60 discs and 55 discs, respectively. Evaluation functions with logical 
features were more than 20 times slower than those with patterns extracted 
from the same logical features. The results indicate that extracted patterns are 
much more effective than the logical features themselves. 

In Figure 7, “without selection” means the accuracy of evaluation functions 
that use automatically generated patterns without selection. The accuracy of 
our evaluation functions was far better than that of patterns without selection 
(plotted with ’ ^ ’ and ’ • ’ ) . The accuracy of the latter functions were established 
in and taken from the authors’ previous work (Kaneko et al., 2001). The results 
indicate that the proposed methods are more effective than existing general 
methods. 
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7.2.5 Comparison with Buro’s Patterns. We compared our evaluation 
functions with those of a specialized Othello program to evaluate our accuracy. 
In Figure 7, “Buro” means our previous reproduction of Buro’s method (Kaneko 
et al., 2001). The accuracy of our evaluation functions improves as the number 
of patterns increases, going beyond that of Buro’s (plotted with the ’O’ and 
’X’). The results indicate that accurate evaluation functions are mechanically 
generated, without having to incorporate manually important shapes in Othello. 

7.2.6 Comparison with Randomly Generated or Selected Patterns. 

To demonstrate the importance of both pattern generation and selection, we 
constructed evaluation functions with random generation/selection instead of 
the proposed generation/selection, and compared their accuracies. 

Random Generation + Proposed Selection. In Figure 7, “random + selec- 
tion” means evaluation functions that use patterns selected with our method, 
from randomly generated patterns instead of the ones generated by this method. 
First, 8,502,664 patterns were generated, each of which was a conjunction of 
the randomly selected status of squares. Then, about 3,000 and 6,000 patterns 
were selected with the selection we propose. The difference between the accu- 
racy of randomly generated patterns and that of ours means that our method of 
generating patterns is indispensable in producing useful patterns. 

Proposed Generation + Random selection. We measured the accuracy of 
evaluation functions with 4,147 patterns that were randomly selected instead of 
with the selection we propose. The error was more than 14.5 and is not plotted 
in the graph. The difference between the accuracy of randomly selected patterns 
and that of our method means that our pattern selection method is indispensable 
in producing useful patterns. 

7.3 Efficiency of Evaluation Functions 

Figure 8 illustrates the efficiency and accuracy of our evaluation functions 
selected for various frequency ranges. The horizontal axis plots the number of 
patterns used in the evaluation functions and the vertical axis plots efficiency 
by the number of positions evaluated in one second. The priority thresholds 
we used were 0.000125, 0.0005, 0.00125, 0.005, and 0.01. As the number 
of patterns increased, the efficiency of evaluation functions deteriorated while 
the accuracy improved, almost regardless of frequency ranges. For this experi- 
ment, we collected a sequence of about 3, 000, 000 positions. Then the df-pn + 
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Efficiency of evaluation functions for various numbers of patterns (55 and 60 discs). 
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Figure 9. Accuracy of evaluation functions with various priority thresholds (55 and 60 discs). 
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search (Nagai and Imai, 1999) visited the root positions of 49 discs, which were 
extracted from 23 matches in IOS records. 9 

Although the efficiency of our evaluation functions was much better than 
the efficiency of evaluation by logical features (Kaneko et al., 2001), it was 
worse than that of a specialized Othello program. Logistello’s speed was 
about 270,000 nodes/sec when running on a Pentium-II 333 MHz (Buro, 1998). 
This speed would have been about 1.4 million nodes/sec (by extrapolation) 
if it had been run on a 1.7-GHz CPU. Further research is required to make 
practical evaluation functions because efficiency is usually more important than 
accuracy. 10 These differences were partly because we did not take efficiency 
into account in the selection of patterns and partly because we could have used 
a much more efficient pattern matching algorithm than the one we proposed if 
we had restricted our patterns to Buro’s (1998) shapes. 



7.4 Parameters for Selection 

To determine appropriate values for frequency ranges and priority thresholds 
so that the proposed selection would work well, we investigated their influence 
on the efficiency and accuracy of the generated evaluation functions and on the 
time required for selection. 

The graphs in Figure 9 plot the accuracy of our evaluation functions for posi- 
tions with 60 and 55 discs, consisting of patterns selected with various frequency 
ranges and various priority thresholds. We can see that the frequency ranges 
do not distinctly affect the quality of selected patterns, if its upper boundary is 
greater than 0.15. Thus, we concluded that the accuracy of evaluation functions 
is mainly determined by the number of patterns used in them. 

The priority thresholds used in selection determine the number of patterns 
that are finally selected. Figure 10 plots the relation between the number of 



9 These are available at ftp : //external . nj . nec . com/pub/igord/othello/ios/. 

10 Future advances in hardware will favour the accuracy because these will eventually compensate for serious 
delays when in-depth searches reach a saturation point (Heinz, 2001). 
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Number of patterns selected 

Figure 11. Time for iterative selection according to frequency ranges. 

selected patterns and priority thresholds. The vertical axis plots the number 
of patterns on a logarithmic scale, and the horizontal axis plots the priority 
thresholds. Here, we used correlation for priority. We can see that the number 
of patterns selected is mainly determined by priority thresholds regardless of 
frequency ranges (denoted by symbols), and that the symbols in the graph are 
plotted at almost the same location if the same priority thresholds are used. 
Also, larger numbers of patterns are selected as lower thresholds are used. 
Thus, one can control the trade-off between the accuracy and efficiency of 
evaluation functions by adjusting the priority thresholds, because these are 
mainly determined by the number of patterns in them as previously discussed. 

The time for iterative selection depends on frequency ranges as well as the 
number of selected patterns. Figure 1 1 plots the relation between time and 
the number of selected patterns with various priority thresholds and frequency 
ranges. The priority thresholds we used were 0.000125, 0.0005, 0.00125, 0.005, 
and 0.01. The horizontal axis plots the number of patterns finally selected by 
iterative selection, and the vertical axis plots the time for selection in minutes. 
These results are acceptable because we have to inspect a larger number of 
candidates during iterative selection for frequency ranges with larger upper 
bounds. 

Figure 12 plots the relation between the efficiency and accuracy of evalu- 
ation functions. The vertical axis plots accuracy by the square root of mean 
square errors, and the horizontal axis plots efficiency by the number of positions 
evaluated in one second. The one right below is to be preferred. 

Considering the time for selection, accuracy, and efficiency of evaluation 
functions, the recommendable upper boundary for the frequency range is be- 
tween 0.15 and 0.3. This value is obviously larger than the expected value 
0.075 in Figure 4. It is partly because most of our patterns had fewer squares 
thanBuro’s (1998). 
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Figure 12. Efficiency and accuracy of evaluation functions (55 and 60 discs). 



8. Concluding Remarks 

In this paper, we described a method of constructing accurate evaluation 
functions by using only the specifications of a target game and a set of training 
positions, which is crucial in constructing a general game player. Experiments 
on Othello revealed that a combination of pattern generation using logic and 
a lightweight pattern selection could efficiently search for and identify useful 
patterns. The method actually constructed accurate evaluation functions. The 
accuracy was by far superior to the evaluation functions generated by existing 
general methods, and was comparable (although slightly worse) to that of Buro’s 
(2002) which is part of a specialized Othello program. 

Our intended future work aims at demonstrating the generality of the ap- 
proach proposed here on other games, such as Shogi, where patterns with 
variable shapes are needed, and also at improving the efficiency of the gen- 
erated evaluation functions in order to investigate total game-playing perfor- 
mance. The development of selection criteria taking efficiency into account 
seems promising, though investigations into their impact on accuracy would 
be required. It would also be challenging to develop a general method that 
introduces game-specific optimizations, including the use of patterns in fixed 
shapes, through an analysis of domain theory. 
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Abstract Amazons is a fascinating game that shares properties of chess and Go. We have 
written a computer program that plays Amazons. This paper reveals the secret of 
this program: its evaluation function. We describe it by explicit formulas, men- 
tion the ideas and goals behind these formulas, and discuss possible refinements. 
By analysing a tournament game of Amazong against the former computer 
world champion 8QP we illustrate how the new features of our evaluation func- 
tion can lead to victory. 

Keywords: Amazons, evaluation function, Amazong 

1. Introduction 

Amazons is a many-faceted game. The game set typically used to play 
Amazons is a draughts board of size 10 x 10, four white and four black chess 
Queens (called amazons), and a supply of Go pieces of one colour (called 
arrows). The starting position and a first move by White are shown in Figure 1. 
Each move consists of two steps: (1) the player chooses an amazon of the own 
colour and moves it like a chess Queen diagonally, vertically, or horizontally; 
the length of the move is up to the player as long as no obstacle (another amazon 
or an arrow) blocks the way; (2) this amazon has to throw an arrow. Arrows 
also move like chess Queens. They stay at their destination square for the rest 
of the game and are represented by black squares (in this paper) (see Figure 
1 right). The players move alternately until one player can no longer move. 
This happens after at most 92 moves. The player who made the last move wins 
the game. White’s advantage of making the first move can be compensated by 
allowing Black to pass n times (e.g., n = 4). 

We first heard about amazons at a workshop on combinatorial game the- 
ory at MSRI in July 2000. We were fascinated by the deepness and sub- 
tlety of ’simple’ positions in Amazons that have been analysed by Berlekamp 
(2000), Snatzke (1996, 2002), Muller and Tegos (2002). Inspired by discus- 
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Figure 1. One good first move out of 2176 possible ones. 

sions with Muller about his computer program Arrow and our own experi- 
ences of playing Amazons we started to write the computer program Ama- 
zong. After two years of successive improvements, Amazong has won 
the Amazons tournament at the seventh Computer Olympiad in Maastricht in 
July 2002. The reader is invited to play against the java applet Amazong at 
http://www.math.unibas.ch/~lieberum/amazong/amazong.html. 

The general design of our program with a special focus on selective search 
has been presented in talks at the Universities of Jena and Edmonton (Lieberum, 
2002). This paper complements these talks and concentrates on Amazong’s 
evaluation function that causes its characteristic style of play, clearly distin- 
guishes it from other programs, and is probably its main strength. 

2. The Different Phases of an Amazons Game 

Amazong distinguishes three phases of the game: (1) the opening at the 
beginning of the game, (2) the filling phase at the end of the game, and (3) the 
main game that consists of everything else. 

The opening in Amazons is the greatest challenge for computer programs. 
The reason is fourfold: (1) the absence of opening theory, (2) a branching 
factor of more than 1000, (3) many situations with more than 20 reasonable 
moves, and (4) the need for calculating deep variations. At this moment human 
play is still superior to computers in the opening. At the Computer Olympiad 
in Maastricht in 2002 Amazong made a random choice of the first move 
out of three possibilities and then started to play according to the results of 
a selective 5- or 6-ply search. Meanwhile, the opening book has grown to 
a machine generated database containing more than 30,000 moves following 
ideas of Lincke (2001). However, the benefit of opening books is limited in 
Amazons because of the enormous complexity of the game. 

The filling phase consists of those positions where each empty square on 
the board can be reached by at most one player by some sequence of moves. 
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In most games this happens after approximately 50 moves. The filling phase 
includes positions with completely decomposed boards, meaning that amazons 
of different colours are separated by arrows. Then the outcome of the game can 
be determined by counting the number of moves left to each player. Although 
this problem is NP-hard (Buro, 2001), it is not difficult to play correctly in most 
positions that show up in real games on a board of size 10 x 10. Typically, 
the players stop to play and agree on the outcome of the game when the filling 
phase starts. 

Two examples of positions from the filling phase are illustrated in Figure 2. 
In the position on the left side of Figure 2 it seems that White has access to 
two empty squares, but he has to cut off one of the empty squares with his 
next move. Therefore this shape is called a defective territory. The position on 
the right side of Figure 2 is called zugzwang because it seems that Black has 
access to three empty squares, but if Black has to move before White does, then 
Black can only use two of the three empty squares. Amazong already tries to 
evaluate defective territory and many Zugzwang situations correctly before the 
filling phase starts. However, these parts of Amazong’s evaluation function 
are still far from being perfect. They will not be discussed here. 




Figure 2. Defective territory and zugzwang. 

Since in most Amazons games the opening book covers only the first few 
moves, one has to deal with many different situations in the main game until the 
filling phase begins and the outcome of the game becomes clear. One possible 
parameter which could help to choose an appropriate strategy in each situation 
is the number of moves played so far. Amazong uses a different parameter to 
choose its strategy. This will be discussed in the next section. 

3. Territorial and Positional Evaluation 

The goal of Amazons is to have access to more empty squares in the filling 
phase than the other player. When player j (j G {1,2}, player 1 is White) 
has exclusive access to a region of n squares, we count these squares as n 
secure points of territory of player j. When both players can reach a square by 
some sequence of moves, it is more complicated to predict which player will 
eventually shoot at that square. For this purpose Amazong uses heuristics 
based on the following ways to measure distances on an Amazons board. 
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Define the distance d\ (a, b ) of two squares a and b as the minimal number 
of chess Queen moves needed to go from a to b. When there is no path, let 
di(a, b) = oo. Similarly, define the distance c^fa, b) as the minimal number 
of chess King moves needed to go from a to b. Obviously, we have d] (a. b) < 
d 2 (a, b). The distances of player j from square a are then given by 



Dj(a) = mi n{di(a, b ) | the square b is occupied by an Amazon of player j}. 

Figure 3 (left) is an example of D{{a). The upper left comers of empty 
squares contain the values D]{a) and the lower right comers contain the values 
Df(a). Figure 3 (right) is an example of D 2 {a). 





Figure 3. The minimal distances D\ (a). 

All Amazons programs seem to use D{ in one or another way (Hashimoto et 
al., 2001). The idea behind the definition of D{ isthatl)J(a) < D\ (a) indicates 
that player 1 has better access to the square a than player 2. One heuristic for 
estimating the territory of player 1 is to assume that player 1 will eventually 
shoot to all squares a with D\{a) < D\(a). This heuristic works very well 
shortly before and in the filling phase. A problem of D\ at the beginning of the 
game is that a single amazon of player j in the centre can cause low values of 
D{ on the whole board, but player j cannot move the amazon into all directions 
at once. Here D 2 comes in. One advantage of D 2 is its locality: often amazons 
have to fulfil a certain task at their position like guarding the territory in their 
neighbourhood. Then a large value of D 2 (a) indicates that player j cannot move 
towards the square a without causing a positional damage, despite a possibly 
low value of D{(a). Another advantage of D 2 over D{ is that it behaves more 
stable when the other player moves and shoots, especially when there are just 
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a few arrows. This makes D J 2 useful for long-term estimates and will stabilise 
the evaluation function in the beginning of the game. 

We use D\ to assign local evaluations between —1 and 1 to each empty 
square. Positive values indicate an advantage of player 1. Then we sum these 
numbers over all empty squares in order to transform the local evaluations into 
global ones. One possible formula for global evaluations £i, t% is given by 

U= £ A (Dl(a),Df(a)), 

empty squares a. 

where 



A (n, m) 



0 if n = m — oo 
k if n = m < oo, 

1 if n < m, 

— l if n> m, 



and — 1 < ^ < 1 is a constant with (— 1)3 k, < 0 when player j is to move. 
The value \k,\ estimates the advantage of moving first when the distances of 
both players to an accessible square agree. We have made good experiences 
with \k\ < 1/5, but some fine-tuning is necessary after each modification 
of the evaluation function. We optimise the choice of k frequently in order 
to obtain a low volatility of the evaluations during iterative deepening. This 
should help to avoid odd-even effects and supports aspiration search with narrow 
a-/?-windows. (Marsland, 1986). 

A program that uses the territorial evaluation t\ as its evaluation function 
plays already quite reasonably, especially shortly before the filling phase. In 
contrast to that the value £2 is useful in the beginning of the game but becomes 
less significant as the game goes on. The evaluations U share the drawback 
that they do not take into account that large values of D? (a) — D } ( a ) are better 
for player 1 than small values. Therefore, other local evaluations than A seem 
to be important, too. The generic approach is to replace A (n, m) by some 
array of parameters and then to optimise these parameters. We have made good 
experiences with the choices 



Cl = 2 Y, -2 _D i (a \ 

empty squares a 

°2 = Y, m in(l,max(— 1, (D^a) - D\(a))/Q)). 

empty squares a, 

Notice that in c\ the local advantage (D\ (a) , D\ (a)) = (1 , 2) is rewarded by 
0.5 points for player 1, (2, 3) by 0.25 points, (1, 3) by 0.75 points, and squares 
a with (Dl(a), D\(a)) = (n, n) contribute 0 points. Other tuples are of minor 
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practical importance for c \ . In contrast to c \ , C 2 depends only on D\ (a) — D\ (a) 
and only large differences indicate a clear advantage of one player. 

Now we have to combine the values and q into one evaluation function. 
A weighted sum with static weights does not seem to be appropriate for this 
because the importance of the values t{ and q varies during the game. Therefore, 
we should first try to compute the expected number W of moves needed until 
the filling phase starts. Instead of trying to estimate W directly we simply 
define 



a 

where we sum over all empty squares a with D\(a) < oo and D\(a) < oo. 
Obviously, we have w = 0 if and only if the position belongs to the filling 
phase and typically w decreases with the number of moves played. Therefore, 
we expect that a good estimate of W will be some function of w. For our 
purposes, w is just as good as W. Now define an evaluation t as 

t = + /2(»Cl + /30)C 2 + h(w)t 2 , 

where (/*)* is a partition of 1 (meaning 0 < fi(w) and Yi fi ( w ) — !)• The 
exact form of the functions fi is a problem of parameter optimisation. Our 
choice of fi has been guided by the observation that t\ becomes more and 
more important during the main game and gives very good estimates of the 
expected territory shortly before the filling phase. Hence, fi is monotonously 
decreasing and satisfies /i(0) = 1. The counterpart of t\ is It rewards 
balanced distributions of the own amazons on the board or helps to hinder the 
other player from reaching such a distribution. This is most important at the 
beginning of the game. The values c\ and c 2 allow to detect finer properties of 
the position than t\ and f 2 alone, because they depend on the quality of local 
advantages. They support good positional play in the opening and a smooth 
transition between the beginning and later phases of the game. This is most 
evident for t\ and c\\ while at the end of the game only t\ counts, c\ rewards 
moves in earlier phases of the game that replace clear local disadvantages by 
small disadvantages and small advantages by clear advantages. 

4. Mobility of Individual Amazons 

Amazong is trying to enclose amazons of the other player inside of small 
regions at the beginning of the game. Compared to other computer programs, 
this is Amazong’s main strength. In this section we present a modification of 
the evaluation function t (see Section 3) that is responsible for this behaviour. 

Enclosing amazons typically does not cause an appropriate change of t (and 
especially of t\) in the beginning of the game. This can be explained as follows: 
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when a single amazon A of player 1 is enclosed in some small region of n points, 
then the Amazons board is divided into two parts: the inside and the outside of 
that region. Player 1 has exclusive access to the territory on the inside. This 
contributes n points to t. On the outside, some active amazons of player 1 might 
overshadow the missing influence of A in D\. In addition, some amazons of 
player 2 that have helped to enclose A might not be in optimal positions but often 
have a large potential to improve their positions. The problem that A cannot 
reach the outside for the rest of the game is not reflected in the computation 
of t. The disadvantage of the enclosed amazon often starts to affect t several 
moves later. Then it is too late. Therefore, a correction term m is needed to 
take into account the mobility of individual amazons. Since active amazons 
can overshadow bad positions of passive amazons in the evaluation function t it 
seems more important to punish passive and enclosed amazons than to support 
active amazons in this correction term. To compute m quickly, consider first 
the number N (a) of empty squares that can be reached from a by a single move 
of a chess King. The numbers N (a) can be updated incrementally during the 
search inside of functions doMove and undoMove. For an amazon A of player 
j on the square a, let 



- d2 ^N(b), 

b 

where we sum over all squares b with d\{a,b) < 1 and D\~\b) < oo. 
When a a = 0 we say that the amazon A is enclosed. Examples of the val- 
ues N (a), a a and of enclosed amazons are shown in Figure 4. For example, 
for the white amazon A in the upper left comer of the figure on the left, we 
compute = 7 + 6 + 5 + 3 + 3 + (5 + 4 + 7 + 4)/2 + 5/4 = 35.25. The 
two white amazons in the lower right comer in this figure are enclosed. 




Figure 4. Neighbours N(a) of empty squares a and the values a a- 
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We have learned in discussions with experienced amazons players that at the 
beginning of a game on a board of size 10 x 10 enclosed amazons should be 
punished by a malus of at least 10 points. In general, we use w from the last 
section to define 



™-= Y f( w , a B)- Y f( w i a A) 

amazons B amazons A 

of player 2 of player 1 

for a suitable function / > 0. The exact choice of / is the hardest optimi- 
sation problem in our evaluation function t + m, so we restrict our description 
to the properties of / that did not change during our experiments: / satisfies 
/( 0, y) = 0 and y) > 0 because the longer an amazon is enclosed before 
the filling phase starts the bigger is the disadvantage. Furthermore, / satisfies 
^ (x, y) < 0 because a low value of a a corresponds to a passive position of the 
amazon A. The last dependence is not linear. We have made good experiences 
with functions / that satisfy 2/(+,5) < /+, 0). This can be explained as 
follows: a a ~ 5 indicates that the amazon A is almost enclosed. However, 
there is a big difference between an enclosed and an almost enclosed amazon. 
The other player possibly has to move an own amazon B to an unfavourable 
square to prevent A from escaping. The resulting change of t then has to be 
compensated for by m. In addition, the task of guarding A makes the amazon B 
less mobile. 

The big difference between enclosed and almost enclosed amazons can be 
seen on the right side of Figure 4: White can enclose the black amazon B 
with olb — 1 in his next move, but then Black can reply by enclosing the white 
amazon, too. Similarly, the task of guarding the white amazon in the upper left 
comer puts the black amazon B with as = 21 in danger of getting enclosed. 

5. Comparison between t 1 and t + m 

In this section we compare our evaluation function t + m with t\ by using the 
game Amazong vs. 8QP played at the 7th Computer Olympiad in Maastricht. 
The position after 26 moves in this game is shown in Figure 3. Amazong won 
the game by 8 points, mainly due to the enclosed black amazon in the upper left 
comer. Figure 5 shows how t\ and the different components of t + m varied 
during the game. 

In this diagram the values ti are computed using |«| = 0.1. The lines 
corresponding to t\, t + m and m are clearly visible in the diagram. On 
move 13 White enclosed the black amazon which causes the maximum of the 
dashed line corresponding to m. Notice that at this point the evaluation t + m 
predicts the outcome of the game very well and differs from t \ by more than 18 
points. Then t + m and t\ become more and more related and finally coincide 
when the filling phase is reached. 




An Evaluation Function for the Game of Amazons 



307 




Figure 5. The components of the evaluation function t + m during a game. 

As expected, the values C2 and £2 are more stable than c\ and t \ . In addition, C2 
and £2 are positive in almost all positions of the game. This indicates that the 
evaluation function of 8 QP does not consider King move distances. Therefore, 
8 QP puts up no resistance against Amazong maximizing these components 
of £ + m. 

6. Refinements and Outlook 

Consider positions with regions that are (almost) separated by arrows. How 
much is it worth when one player has a majority of amazons inside of such a 
region? Instead of looking for a general answer to this difficult question, we 
simply observe that the territorial evaluation £ has the tendency to underestimate 
the advantage of the majority. A possible correction term of £ could take into 
account the distances between each empty square and each amazon. However, 
the computations of these values would take almost four times longer than the 
computations of D\ (a). Therefore, it seems more appropriate to compute only 
the numbers of amazons A v of player j on squares b v that satisfy di(a, b v ) = 
D\ (a). These numbers can be computed efficiently together with D\ (a). They 
are useful as additional inputs of refined definitions of q and £*. In addition to 
these corrections, the disadvantage of having a majority of amazons in a small 
region early in the game should be reflected by m. This situation is not treated 
correctly by m because when amazons of both players are inside of one region 
the involved amazons are not considered as being enclosed. 

A second refinement concerns the distribution of amazons on the board. In 
the opening it is desirable (especially for Black) to reach a position with exactly 
one amazon in each comer of the board. The distances from such a distribution 
can be used to improve the evaluation function in the opening phase. 
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In some experiments, we weighted squares in the computations of q and t{. 
The weights depended on w and the distance of the square from the centre of 
the board. It is difficult to assess the importance of this third refinement. 

A fourth idea for improvements is to repeat the constructions of Section 
3 for other distance functions such as d\ + cfe or 2d\ + o ?2 (or estimates of 
these distances that can be computed more efficiently). One has to decide very 
carefully how many different distance functions one should use, because each 
additional distance function slows down evaluations considerably. 

The biggest weakness of our evaluation function seems to be the underes- 
timation of large territorial frameworks at the beginning of the game. Our 
hope (potential fifth refinement) is to incorporate ideas from Lorentz (2002) to 
overcome this weakness. It is difficult because in many situations one has to 
make a choice between two plans that are often incompatible: (1) chasing and 
enclosing amazons or (2) building large territorial zones. The decision which 
plan is the more promising one in an actual position is a challenge for the next 
generation of Amazons programs. 
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Abstract Opponent-Model search is a game-tree search method that explicitly uses knowl- 
edge of the opponent. There is some risk involved in using Opponent-Model 
search. Both the prediction of the opponent’s moves and the estimation of the 
profitability of future positions should be of good quality and as such they should 
obey certain conditions. To investigate the role of prediction and estimation in 
actual computer game-playing, experiments with Opponent-Model search were 
performed in the game of Bao. After five evaluation functions had been gener- 
ated using machine- learning techniques, a series of tournaments between these 
evaluation functions was executed. They showed that Opponent-Model search 
can be applied successfully, provided that the conditions are met. 

Keywords: opponent models, search, evaluation functions, Bao 

1. Introduction 

This contribution investigates under what conditions the usual form of Oppon- 
ent-Model search (OM search) can be made successful. To understand the mat- 
ter we provide a condensed introduction to OM search in Section 2. In Section 
3 we give a brief overview of the family of mancala games to which Bao be- 
longs and we describe the Bao rules. In Section 4 we explain how we obtained 
five evaluation functions for Bao. Section 5 gives the tournament setup and 
in Section 6 we present and discuss the results. The contribution ends with 
conclusions in Section 7. 1.2. 

2. Opponent-Model Search 

OM search (Carmel and Markovitch, 1993; Iida, Uiterwijk, and Van den 
Herik, 1993; Carmel and Markovitch, 1998; Donkers, Uiterwijk, and Van den 
Herik, 2003) is a game-tree search algorithm that uses a player’s hypothesized 
model of the opponent in order to exploit weak points in the opponent’s search 
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strategy. The original formulation of the OM-search algorithm is based on three 
strong assumptions concerning the opponent and the player: 

(1) the opponent (called min) uses minimax (or an equivalent algo- 
rithm) with an evaluation function (V op ), a search depth and a 
move ordering that are known to the first player (called max); 

(2) MAX uses an evaluation function (Vo) that is better than min’s 
evaluation function; 

(3) max searches at least as deep as min. 

This OM search procedure prescribes that max maximizes at max nodes, 
and selects at min nodes the moves that MAX believes min would select. Below 
we provide a short technical description of OM search, its notation, the relations 
between the nodes in the search tree, and some hints for an efficient implemen- 
tation. For an extensive description of OM search we refer to Donkers et al. 
(2003). 

OM search can be described by the following equations, in which Vo ( • ) , V op ( • ) 
are the evaluation functions, and vo(-),v op (-) are the node values. Subscript ‘0’ 
is used for max values, subscript ‘ op ’ is used for min values. 



vo(P) = 



ma Xj vo ( Pj ) max nodes, 

v 0 (Pj), j = min arg^ v op (Pi) min nodes, 
Vq(P) leaf nodes. 



( 1 ) 



{ maxj v op (Pj) 

minj Vop(Pj) 

Vo P (P) 



max nodes, 

min nodes, (2) 

leaf nodes. 



If P is a min node at a depth larger than the search-tree depth of the opponent, 
then vo(P) = minj vq {P j). 



2. 1 Implementation 

For a search tree with branching factor w and even fixed depth d , OM search 
needs exactly n = w d / 2 evaluations of Vo(-) to determine the root value, since 
the search strategy is as follows: in each max node all w children are investigated 
and in each min node, only one child is investigated (see Donkers, Uiterwijk, 
and Van den Herik, 2001). Because the OM-search value is defined as the 
maximum over all these n values of ^o(-)» none °f these values can be missed. 
This means that the efficiency of OM search depends on how efficient the values 
for v op (-) can be obtained. 

A straightforward and efficient way to implement OM search is by applying 
a- (3 probing : at a min node it starts performing a-f3 search with the opponent’s 
evaluation function (the probe), and thereafter it performs OM search with the 
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move that a-(3 search has returned; at a max node, it maximizes over all child 
nodes. The probes can be efficiently implemented by using an enhanced form of 
a-f3 search. Because for every min node, a separate probe is performed, many 
nodes are visited during multiple probes. (For example, every min node Pj on 
the principal variation of a node P will be probed at least twice.) Therefore, 
the use of transposition tables leads to a major reduction of the search tree. 

The a- 13 probes at a min node P and at each grandchild (min nodes Pj k ) are 
not independent since the a- f3 value of P, v op (P), is necessarily equal to or 
larger than all a- (3 values of Pj k . This means that v op (P) can be used to reduce 
the window of the probes at the grandchild nodes by setting the f3 parameter of 
the probe at Pj k to v op (P ) + 1. 

OM search assumes that MAX speculates on all min nodes about the move 
that min is going to choose. In deeper parts of the search tree, the prediction of 
min’s move is based on shallower a-/3 probes than higher in the tree. It could 
therefore be justified to speculate only in the upper portion of the search tree. 

2.2 Risk in Opponent-Model Search 

Although using knowledge of the opponent during search seems obvious and 
OM search looks like a reasonable approach, there are three different types of 
risk involved. If these risks are not taken seriously, OM search is bound to fail. 

First, OM search does not take into account any uncertainty about the oppo- 
nent: the reasoning by the algorithm assumes perfect knowledge in the above 
sense. Since perfect knowledge of the opponent is hardly available in reality, 
this is a strong assumption. When the knowledge of the opponent is not per- 
fect, the algorithm can still be used, but this will cause a certain amount of risk, 
depending on the quality of the knowledge. This first kind of risk has been 
described extensively in Iida, Handa, and Uiterwijk (1995). (In Donkers et al. 
(2001) an extension of OM search is described that does include uncertainty: 
Probabilistic Opponent-Model search.) 

In Donkers et al. (2003), a second kind of risk in using OM search is in- 
vestigated. It appears that even when max has perfect knowledge of min’s 
evaluation function, using OM search may be unwise: when max makes a 
large overestimation of the profitability of a certain position while min is judg- 
ing it correctly, then max is possibly attracted to that position. A condition 
that should prevent this from happening is called admissibility of the pair of 
evaluation functions: MAX should not overestimate a position that min not also 
overestimates. 

A third kind of risk in using OM search (introduced in this contribution) 
is as follows. Perfect knowledge of the opponent’s evaluation function is not 
equal to a perfect prediction of the opponent’s moves. This is caused by the 
difference (normally one ply) in search depth between a player’s prediction of 
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the opponent’s move and the actual search that the opponent uses at the next 
move. 

For OM search to be successful, the effects of these risks should be alleviated. 
In a series of experiments on the game of Bao, we investigate what the influence 
is of a good prediction of the opponent’s moves and what the influence is of a 
better estimation of the own profitability of positions. Moreover, we study the 
effect of risk management. We assume perfect knowledge of the opponent’s 
evaluation function but admissibility is not guaranteed. 

3. The Mancala Game Bao 

In large parts of the world, board games of the mancala group are being played 
in completely different versions (cf. Murray, 1952; Russ, 2000). Whatever the 
case, most mancala games share the following five properties: 

(1) the board has of a number of holes , usually ordered in rows; 

(2) the game is played with indistinguishable counters (also called 
pebbles, seeds, shells); 

(3) players own a fixed set of holes on the board; 

(4) a move consists of taking counters from one hole and putting 
them one-by-one in subsequent holes (sowing), possibly followed 
by some form of capture; 

(5) the goal is to capture the most counters (for Bao it is to immobilize 
the opponent). 

Mancala games differ in the number of players (1, 2 or more), the size 
and form of the board, the starting configuration of the counters, the rules for 
sowing and capturing, and in the way the game ends. The games of the mancala 
group are known by many names (for instance Wari, Awele, Bao, Dakon, and 
Pallankuli). For an overview of different versions and the rules of many mancala 
games, we refer to Russ (2000). 

Among the mancala games, (Zanzibar) Bao is regarded as the most complex 
one (De Voogt, 1995). This is mainly due to the amount of rules and to the 
complexity of the rules. Bao is played in Tanzania and on Zanzibar in an 
organized way. There exist Bao clubs that own the expensive boards and that 
organize official tournaments. 

The exact rules of the game are given in, for example, De Voogt (1995). 
Below, we summarize the properties that discriminate the game from the more 
widely known games Kalah and Awari. 

Bao is played on a board with 4 rows of 8 holes by two players, called South 
and North, see Figure 1. Two square holes are called houses and play a special 
part in the game. There are 64 stones involved. At the start of the game each 
player has 10 stones on the board and 22 stones in store. Sowing only takes 
place on the own two rows of holes. The direction of sowing is not fixed. At 
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the start of a move, a player can select a direction for the sowing (clockwise or 
anti-clockwise). During sowing or at a capture, the direction can turn at some 
point. This is dictated by deterministic rules. 

If a capture is possible then it is obliged in Bao. This means that a position 
is either a capture position or a non-capture position. Captured counters do 
not leave the board but re-enter the game. Counters are captured from the 
opponent’s front row. These counters are immediately sown in the own front 
row. This implies that the game does not converge like Kalah and Awari. 

Moves are composite. If at the end of a sowing, capture is possible, the 
captured counters are sowed immediately at the own side of the board. This 
second sowing can again lead to a new capture followed by a new sowing. If 
a capture is not possible, and the hole reached was non-empty, all counters are 
taken out of that hole and sowing continues. This procedure goes on until an 
empty hole is reached, which ends the move. 

Moves can be endless because in a non-capture move, sowing can go on 
forever. The existence of endless moves can be proven theoretically (Donkers, 
Uiterwijk, and De Voogt, 2002). In real games, moves that take more than an 
hour of sowing also occasionally occur, but players usually make small mistakes 
during sowing or simply quit the game. So, real endless moves never lead to 
endless sowing. 

Bao games consist of two stages: in the first stage, stones are entered one by 
one on the board at the start of every move. In the first stage, a game ends if the 
player to move has no counters left in the front row. As soon as all stones are 
entered, the second stage begins and a new set of rules applies. In the second 
stage, a game ends if the player has no more than one counter in any hole of 
both rows. A draw is not defined in Bao. Note that the goal of Bao is not to 
capture the most stones, but to immobilize the opponent. 
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In Donkers and Uiterwijk (2002), an analysis of the game properties of Bao is 
provided. The state-space complexity of Bao is approximated to be 1.0 x 10 25 , 
which is much higher than those of Awari (2.8 x 10 11 ) and Kalah (1.3 x 10 13 ). 
The shortest game possible takes 5 ply, but most games take between 50 and 60 
ply because they end (soon) after the start of the second stage. The maximum 
number of moves possible at any position is 32, but the average number of 
possible moves varies between 3 and 5, depending on the stage of the game. 
Forced moves occur quite often. The average game length (d) and branching 
factor (w) are normally used to estimate the size of a game tree that has to be 
traversed during search (w d ). For Bao the estimate is roughly 10 34 . This number 
together with the game-tree complexity (10 25 ) places Bao in the overview of 
game complexities above checkers and in the neighbourhood of Qubic (Van den 
Herik, Uiterwijk, and Van Rijswijck, 2002). 

4. Generating Evaluation Functions for Bao 

In order to conduct the OM-search experiments, we created 5 different evalu- 
ation functions. We describe them below. (For operational reasons (see Section 
5) we would like to have them ordered in increasing quality with respect to the 
strength of the resulting players.) 

The first two evaluation functions were created by hand. The first one, called 
Material, simply takes the difference in the number of stones on both sides 
of the board as the evaluation score. The second hand-made evaluation func- 
tion is called Default. This function incorporates some rudimental strategic 
knowledge of Bao. For instance, it is good to have more stones in your back row 
since this increases the mobility in the second stage of the game. The function 
awards 3 points to stones in the front row, 5 points to stones in the back row, 
and 5 additional points to opponent stones that can be captured. If the own 
house is still active, 200 extra points are given. The total score of the position 
is the score for max minus the score for min. There is a small asymmetry in 
this function: if MAX can capture min’s house 100 points are rewarded, but if 
min can capture max’s house, only 50 points are subtracted. This asymmetry 
is intended to produce a more offensive playing style. 

The third evaluation function was created by using a genetic algorithm (Hol- 
land, 1975). The evaluation function was represented by an integer-valued 
chromosome of 27 genes: one gene for the material balance, one gene per hole 
for the material in the own back and front row, one gene per hole in the front 
row for capturing, one gene for an active house, and another gene for capturing 
the opponent’s house. The total score of a position was the score for the player 
minus the score for the opponent. The fitness of a chromosome was measured 
by the number of games out of 100 that it won against a fixed opponent. In 
these matches, both players used a-f3 search with search depth 6. The genetic- 
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algorithm parameters were as follows: the population size was 100, only the 10 
fittest chromosomes produced offspring (using a single-point crossover), the 
mutation rate is 5 per cent for large changes in a gene (i.e., generate a new 
random number for the gene) and 20 per cent for minor changes (i.e., altering 
the value of a gene slightly). The genetic algorithm was continued until no 
improvement occurs anymore. We conducted three runs: in the first run, the 
opponent was Default. In the second and third run, the opponent was the 
winner of the previous run. The name of the resulting evaluation function is 
Ga3. 

Thereafter we used another machine-learning technique to create the fourth 
evaluation function, namely TD-Leaf learning (Baxter, Trigdell, and Weaver, 
1998). This is a temporal-difference method that is specialized for learning 
evaluation functions in games. The evaluation function trained was a linear 
real-valued function with the same parameters as the genes in the chromosomes 
above, except that there were separate parameters for the two sides of the board. 
Batch learning is applied with 25 games per batch. The reinforcement signal 
used to update the parameters was the number of games in the batch that the 
player wins against a fixed opponent, in this case Ga 3. The search depth used 
in the games was 10. The A-factor and the annealing factor both were set to 
0.99. This produced our fourth evaluation function, called Tdl2b. 

The last evaluation function was also produced by TD-Leaf learning, but this 
time we used a normalized Gaussian network (NGN) as evaluation function, 
similar to way in which Yoshioka, Ishii, and Ito (1999) trained an evaluation 
function for the Othello game. The NGN had 54 nuclei in a 54-dimensional 
space. Every dimension correlated with a parameter in the previous evaluation 
function. The reinforcement signal was the number of games out of 25 won 
against a fixed opponent, being Tdl2b. The search depth used in the games 
was 6, because the computation of the output for an NGN is relatively slow. No 
batch learning was applied here. The A-factor was set to 0.8 and the annealing 
factor was set to 0.993. This evaluation function is called Ngnd6. 

5. Experimental Set-up 

We conducted seven different tournaments between five players that each 
used one of the five evaluation functions. We denote the players by the name 
of their evaluation function. All tournaments followed a double round-robin 
system: every player was matched against every other player, one time playing 
South and one time playing North. Each match between two players consisted 
of 100 games; hence each tournament counted 2000 games. In the tables below 
the results are reported in a special way (see the caption of Table 1). One reason 
is that we can easy read off from the table any improvement by an evaluation 
function. The games began at the start positions given in the Appendix and 
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were played to the end. To prevent problems with infinite moves, any move 
that involves the sowing of more than 100 stones was considered infinite and 
illegal. A position at which a player could only perform one of these long moves 
was a loss for that player. 

In the first two tournaments, both players used a- (3 search. In the other five 
tournaments, South used OM search with perfect knowledge of the opponent’s 
evaluation function. North always used a-/3 search with search depth 6. The 
search depth of South differed per tournament. No time restrictions were given. 
We used an implementation of OM search with a-fi probes and allowed only 
one ply of speculation. Since in Bao draws are not possible, and since we aimed 
to compare the performance of the different search algorithms used by South, 
the score of a match was just the number of games out of 100 that was won by 
South. 

At every position at which South was to move, we also detected the move(s) 
that a~l 3 search would select for South. In this way we were able to count the 
number of times that OM search differed from a-/3. 

In the implementation of the a-(3 probes for OM search we took care of the 
fact that (some of) the evaluation functions are asymmetric. The asymmetry 
implies that evaluating a position when South is max, is not the same as eval- 
uating the same position when North is max and taking the negative of the 
value. Furthermore, we dealt with multiple equipotent moves for min: if min 
has multiple equal choices, max will select the move with the lowest value for 
v Q . 

The exact set-up of each of the seven tournaments will be explained along 
with the results in the next section. 

6. Results and Discussion 

First tournament: a-/3 plain — Table 1 gives the outcome of the first tour- 
nament. Both South and North used a-/3 search with search depth 6. The 
table clearly shows that the evaluation functions differ in quality and that every 
following evaluation function is operationally better than any of the previous 
ones. (Since the size of each match is 100, the 95% confidence intervals are 
approximately plus/minus 10 per match and plus/minus 20 for the total scores.) 

Second tournament: a-(3 extended — The second tournament was a checking 
tournament. South was allowed to search two extra plies (8 instead of 6). The 
results are presented in Table 2. The table shows that all players profited from 
the increased search depth. Only the match of Default against G A3 was less 
fortunate for Default. This illustrates the poor quality of this evaluator. 

Third tournament: OM plain — In the third tournament, South used OM search 
with one ply of speculation and with search depth 6. The results in Table 3 
show that three players, Material, Ga 3, and Tdl2b, profited from using 
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OM search, but that the two other players, Default and N gnd6, did not profit 
and played worse than in the first tournament. 

Fourth tournament: OM extended — South was using OM search with one ply 
of speculation as in the third tournament, but in this tournament it was allowed 
to search two ply deeper. The a-f3 probes were still restricted to depth 6. This 
means that South had better knowledge over the game than North, a situation 
that is comparable to the second tournament. Table 4 shows that South was not 
able to profit fully from the extra search depth. Although all players performed 
better than in the third tournament where they were given a depth of 6 ply, only 
Material played better than in the second tournament. This indicates that 
searching deeper for yourself in OM search is not sufficient for success. 

Fifth tournament: OM with perfect opponent prediction — The fifth tournament 
gave South a different advantage: it was allowed to extend the a-fd probes to 
depth 7. The search depth (for the own evaluation) was 6. In this way, South not 
only had perfect knowledge of the opponent’s evaluation function, but South 
could also predict almost perfectly what North would be doing in the next move. 
The search depth of the a-f3 probes (which was 6, because the probes started 
at depth 1) was namely exactly the same as the search depth of North. In the 
case of equal evaluated moves, South selected the move with the lowest own 
evaluation. This was not necessarily the move that North would play. Table 5 
gives the results of this tournament. All players, except Default profited from 
this advantage, and played better than in tournament 1, albeit less good than in 
the second tournament in which they just searched deeper. The advantage also 
gave less good results than the advantage in tournament 4, except for player 
Ngnd6. From these results we can infer that knowing exactly the moves of 
the opponent does not help if the own judgement is too weak. 

Sixth tournament: OM perfect — The sixth tournament combined the advan- 
tages of the fourth and fifth tournament for South. The search depth for the own 



S\N 


Material 


Default 


Ga3 


Tdl2b 


Ngnd6 


Score 


Material 


- 


55 


35 


19 


18 


127 


Default 


48 


- 


54 


30 


28 


160 


Ga3 


55 


61 


- 


36 


30 


182 


Tdl2b 


69 


65 


57 


- 


39 


230 


Ngnd6 


79 


73 


75 


60 


- 


287 



Table 1. Results of the first tournament between 5 evaluation functions for Bao. Each cell 
shows the number of games won (out of 100) by South (the row) against North (the column). 
The column on the right shows the number of games won (out of 400) by each evaluation function 
when playing South. 
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S\N 


Material 


Default 


Ga3 


Tdl2b 


Ngnd6 


Score 


Material 


- 


57 


62 


40 


32 


191 


Default 


71 


- 


52 


49 


34 


206 


Ga3 


80 


75 


- 


62 


49 


266 


Tdl2b 


86 


76 


69 


- 


57 


288 


Ngnd6 


88 


76 


80 


70 


- 


314 



Table 2. Results of the second tournament between 5 evaluation functions for Bao. Both sides 
use a-/3, but South searches 2 ply deeper (8) than North (6). 



S\N 


Material 


Default 


Ga3 


Tdl2b 


Ngnd6 


Score 


Material 


- 


57 


50 


30 


24 


161 


Default 


46 


- 


46 


26 


25 


143 


Ga3 


59 


57 


- 


40 


35 


191 


Tdl2b 


78 


64 


60 


- 


46 


248 


Ngnd6 


71 


58 


66 


61 


- 


256 



Table 3. Results of the third tournament between 5 evaluation functions for Bao. South uses 
OM search with perfect knowledge of the opponent’s evaluation function. The search depth is 6 
for both sides. 



S\N 


Material 


Default 


Ga3 


Tdl2b 


Ngnd6 


Score 


Material 


- 


60 


64 


49 


39 


212 


Default 


63 


- 


47 


44 


41 


195 


Ga3 


70 


66 


- 


57 


40 


233 


Tdl2b 


80 


69 


70 


- 


56 


275 


Ngnd6 


84 


68 


71 


59 


- 


282 



Table 4 . Results of the fourth tournament between 5 evaluation functions for Bao. South uses 
OM search with perfect knowledge of the opponent’s evaluation function. The search depth is 8 
for South, with a-/Tprobes to depth 6, and the search depth is 6 for North. 



evaluation was 8 for South and the a-fd probes for the opponent extended to 
depth 7. The results in Table 6 show that the power of South was significantly 
increased. All players performed better than in tournament 1, and all players, 
except Ga 3 also played better than in tournament 2. The results of Ga 3 were 
only slightly less than in tournament 2. 

Seventh tournament: OM with strict risk management — In the seventh and last 
tournament, South applied OM search with strict risk management. South only 
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S \ N 


Material 


Default 


Ga3 


Tdl2b 


Ngnd6 


Score 


Material 


- 


59 


57 


31 


27 


174 


Default 


50 


- 


48 


32 


23 


153 


Ga3 


53 


64 


- 


36 


30 


183 


Tdl2b 


69 


73 


66 


- 


40 


248 


Ngnd6 


77 


82 


77 


63 


- 


299 



Table 5. Results of the fifth tournament between 5 evaluation functions for Bao. South uses 
OM search with perfect knowledge of the opponent’s evaluation function. The search depth is 6 
for both sides, but South uses a-/3 probes to depth 7. 



S\N 


Material 


Default 


Ga3 


Tdl2b 


Ngnd6 


Score 


Material 


- 


76 


69 


54 


58 


257 


Default 


59 


- 


66 


48 


46 


219 


Ga3 


75 


77 


- 


56 


55 


263 


Tdl2b 


79 


88 


83 


- 


57 


307 


Ngnd6 


80 


88 


85 


68 


- 


321 



Table 6. Results of the sixth tournament between 5 evaluation functions for Bao. South uses 
OM search with perfect knowledge of the opponent’s evaluation function. The search depth is 8 
for South, with a-/3 probes to depth 7, and the search depth is 6 for North. 



S\N 


Material 


Default 


Ga3 


Tdl2b 


Ngnd6 


Score 


Material 


- 


66 


49 


32 


33 


180 


Default 


49 


- 


45 


37 


31 


162 


Ga3 


63 


62 


- 


35 


33 


193 


Tdl2b 


72 


66 


64 


- 


46 


248 


Ngnd6 


75 


71 


76 


69 


- 


291 



Table 7. Results of the seventh tournament between 5 evaluation functions for Bao. South 
uses OM search with perfect knowledge of the opponent’s evaluation function and strict risk 
management. The search depth is 6 for both sides. 



deviated from the strategy that a-/3 search imposed if the move that OM search 
advised had the same Minimax value. The search depth was equal to the third 
tournament. Since it occurred relatively often in Bao that multiple moves at 
the same position had the same Minimax value, South did have some room to 
speculate. The results in Table 7 show that this approach was successful too. 
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Tournament 


Material 


Default 


Ga3 


Tdl2b 


Ngnd6 


1: a- (3 Plain 


127 


160 


182 


230 


287 


2: a-f3 Extended 


191 


206 


266 


288 


314 


3: OM plain 


161 


143 


191 


248 


256 


4: OM extended 


212 


195 


233 


275 


282 


5: OMperf. opp. 


174 


153 


183 


248 


299 


6: OM perfect 


257 


219 


263 


307 


321 


7: OM no risk 


180 


162 


193 


248 


291 



Table 8. Overview of the seven Bao tournaments. 



All players performed better than in the first tournament and also better (or 
equally poor in case of Tdl2b) than in the third tournament. 

A summary — Table 8 summarizes the results of the seven tournaments. Each 
cell contains the final score of a tournament (400 games), playing South. On all 
rows the scores are increasing from left to right (except for the first two columns). 
This means that the order of quality for the evaluation functions indeed is as 
follows: (Material, Default) < Ga3 < Tdl2b < Ngnd5. The ordering 
of Material and Default is unclear, but both evaluation functions are poor. 
The table shows that if only the search depth is increased (4: OM extended) or 
only the prediction of the opponent is improved (5: OM perf. opp.), the results 
are not as good as just using a-/3 with two additional ply of search. When both 
methods were combined, (6: OM perfect), the results were better. Furthermore, 
the table shows that using OM search with strict risk management (7: OM no 
risk) led to better results than using plain OM search and plain a-/3. 

Deviations — The last overview, in Table 9, provides insight into the number of 
times that OM search deviated from the a-/3-search strategy. The table shows 
that searching more deeply for the own evaluation had a larger effect than 
searching more deeply for the prediction of the opponent. The table also shows 
that the number of deviations was larger in tournament 4 than in tournament 6. 
Since the results of tournament 4 were less good than the results of tournament 

6, it seems that an incorrect prediction of the opponent leads to extra deviations 
that did not contribute to a positive outcome. 

7. Conclusion 

The experiments described in this paper are a follow-up to earlier experiments 
with OM search in other game domains. In Donkers et al. (2003) we described 
experiments in Lines of Action and in the chess endgame KQKR. The experi- 
ments in Lines of Action showed that OM search with evaluation functions of 
poor quality led to bad results. The experiments in the chess endgame KQKR 
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Tournament 


Material 


Default 


Ga3 


Tdl2b 


Ngnd6 


3: OM plain 


3.27d=2.1 


3.36=L1.8 


2.38±1.8 


2.47=bl.7 


2.21±1.6 


4: OM extended 


4.85±2.5 


5.433=2.3 


4.90=b2.4 


4.46=b2.3 


4.553=2.3 


5: OMperf. opp. 


3.30±1.9 


3.12±1.8 


2.52±1.6 


2.43=bl.6 


2.26±1.6 


6: OM perfect 


4.35±2.1 


5.14±2.5 


4.47±2.2 


3.933=1.9 


3.98±1.9 


7: OM no risk 


1.83±1.5 


0.823=1.0 


0.293=0.5 


0.73±0.9 


0.503=0.7 



Table 9. Overview of the average number of moves per game in which the move that OM search 
selected differed from the move that a-f3 search (with search depth 6) suggested. The standard 
deviation ranges between 0.5 and 2.5. The average number of moves per game for South is 19.8 
over all games in the tournaments. 



showed that OM search with a perfect evaluation function (i.e., an endgame 
database) for max can be useful, but the results are not conclusive. 

The Bao experiments in this contribution were designed to identify those 
factors that influence the success or failure of OM search. Although the exper- 
iments were not encyclopedic and therefore did not produce firm qualifications 
of these factors, many effects are statistically significant. In all, the Bao exper- 
iments provide a good insight into the working of OM search. For instance, 
it appears that a combination of adequate opponent prediction and extended 
search depth is needed for good results. Of these two factors, the extended 
search depth seems to be more important than the good prediction. Moreover, 
the quality of the evaluation functions appears to be important for the effect of 
OM search. For plain OM search the results were not good for most of the play- 
ers because the evaluation functions do not obey the admissibility requirement. 

A generalisation of the results to other games leads to the statement that the 
search method can only be applied successfully when additional resources (e.g., 
search time) are available. The additional search time (in comparison with the 
opponent) must either be spent for the prediction of the opponent’s move, or for 
the risk management. If these additional resources are not available, OM search 
cannot with certainty be applied successfully. 

In order to measure the effects of opponent prediction and extended search 
more precisely, the sample size should be increased and more game details 
should be analysed, such as the number of times that the predicted move differs 
from the move played by the opponent. Furthermore, a deeper study of the 
properties of the trained evaluation functions and the matches between players 
themselves might provide more background information. A final suggestion 
for future research is to investigate the possibilities for risk management more 
deeply since this seems a promising approach. 
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Appendix 

The following table gives the 100 start positions used in the Bao experiments. 
The positions are generated by playing 10 random legal moves for every player 
from the official Bao opening position. Each row gives the contents of the holes 
of one position. The numbering of the holes is according to De Voogt (1995). 
The last two columns indicate whether South and North have an active house. 
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Abstract Kriegspiel is a chess variant invented to make chess more similar to real warfare. 

In a Kriegspiel game the players have to deal with incomplete information because 
they are not informed of their opponent’s moves. Each player tries to guess the 
position of the opponent’s pieces as the game progresses by trying moves that 
can be either legal or illegal with respect to the real situation: a referee accepts 
the legal moves and rejects the illegal ones. However the latter are most useful 
to gain insight into the opponent’s position. While in the past this game has been 
popular in research centres such as the RAND Institute, currently it is played 
mostly over the Internet Chess Club. 

The paper describes the rationale and design of a Kriegspiel program to play 
the ending for King and Rook versus King. Such a kind of ending has been 
theoretically shown to be won for White, however no programs exist that play 
the related positions perfectly. We introduce an evaluation function to play these 
simple Kriegspiel positions, and evaluate it. 

Keywords: Kriegspiel, Eastern rules, Western rules, metaposition 

1. Introduction 

The game of chess has been widely studied because it is a microcosm that 
mirrors decision making in real-world situations. However, a basic limit of 
chess as a field for studying decision making is that decisions by players have 
nothing to do with uncertainty in the sense in which the term is used in game 
theory, since the goal and the best strategy for each player can be computed 
easily and completely. 

The game of Kriegspiel is a chess variant invented around 1896 to make 
chess more similar to real warfare. It involves incomplete information: both 
the premises and the consequences of a decision are partially unknown, thus 
it is considered a complex game because of the asymmetry in the knowledge 
available to the players as the game progresses. In fact, when a player makes 
an illegal move, from his failure he can infer data that cannot be inferred by 
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his opponent as well. Thus, in general, during a Kriegspiel game each player 
knows what he knows, but he does not know what his opponent knows. 

Kriegspiel is a game interesting in several ways. First, it is based on the same 
rules as chess, but is has a completely different (and not well studied) theory. 

It is a game of imperfect information, such as Poker. However Kriegspiel has 
no stochastic element, which makes it different from Poker. To play Kriegspiel 
well we have to use logic and the mathematics of probability. 

At the moment there are no programs that play a reasonable Kriegspiel game. 
On the Internet Chess Club (ICC) a couple of programs are available, which 
are able to play Kriegspiel, however none of these programs is among the best 
players (on ICC there are several hundreds of Kriegspiel players, and every day 
they play hundreds of games). 

We recall that a number of papers have studied some aspects of Kriegspiel 
or Kriegspiel-like games. Below we provide some instances of related work. 
Ferguson (1992, 1995) analyses the endings KBNK and KBBK, respectively. 
Ciancarini, DallaLibera, and Maran (1997) describe a rule-based program to 
play the KPK ending according to some principles of game theory. Sakuta 
and Iida (2000) describe a program to solve Kriegspiel-like problems in Shogi 
(Japanese Chess). Bud et al. (2001) describe an approach to the design of a 
computer player for a subgame of Kriegspiel, called Invisible Chess. 

In this paper we explore some issues of the ending KR vs K in Kriegspiel. 
We aim to design a program that will be a prototype component of a multi- 
agent system able to play Kriegspiel. We describe how we have built such a 
component, and how we evaluate its behaviour, with the purpose to improve its 
playing ability. 

This paper has the following structure. In Section 2 we describe the basic 
rules of Kriegspiel, including a study of its main variants. In Section 3 we 
introduce the theory of the KRvsK ending in Kriegspiel. In Section 4 we 
describe our search algorithm. In Section 5 we describe our evaluation function: 
it is specific for this ending, but in our knowledge it is the first time an evaluation 
function for playing Kriegspiel has been defined. In Section 6 we describe how 
we use a transposition table to support the search across a tree of metapositions. 
Finally, in Section 7 we evaluate our approach. 

2. The Rules 

Perhaps the lack of standard rules has been an obstacle to the diffusion of 
Kriegspiel as a research subject. In fact, there are several different sets of rules, 
basically classified into two families as Eastern rules (widespread in UK and 
Eastern US) and Western rules (widespread in Western US) (Pritschard, 1994; 
Li, 1994). The rules given by J.D. Wilkins in Williams (1950) have been used 
for years in the RAND Institute. The ICC rules are derived from the RAND 
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rules. However, ICC managers introduced some variants which make the play 
over the Internet slightly more difficult. 

A Kriegspiel player tries a move selected 
among the set of his pseudo-legal moves, in- 
cluding possible pawn captures. For instance, 
in the Diagram 1 : possible tries for White are 
Ab2, Aa3, Ad2, Ae3, Af4, Ag5, Ah6, 
*dl, ^d2, ^e2, &f2, &fl, d5, dc5, de5. 

The referee, who knows the list of legal 
moves for both sides, answers all tries with 
one of the following six messages. 

End of game If the list of legal moves is 
empty the position is checkmate or 
stalemate, and the referee announces 
the corresponding finish. 

Move accepted If the try is legal, the referee 
says “White moved” (or “Yes”) and gives no further information. We 
denote this situation also as “Silent referee", because he gives no useful 
information. 

Illegal move The try selected by White might be illegal on the referee’s board. 
For instance, in the position of Diagram 2 (as it is on the referee’s board) 

The referee says “Illegal move” (or “No”) 
and White infers that either the diagonal to 
h6 is obstructed by an enemy piece, or the 
Bishop is pinned by a black major piece in al 
or bl. 

Impossible move The message “Impossible 
move” is given when a player tries a 
move outside his set of pseudo-legal 
moves. In Diagram 2 an impossible 
try could be ^e3. 

Check If a move is accepted and gives check, 
the referee announces the check and its 
direction (row, column, major diago- 
nal, minor diagonal, Knight). In the 
example, the move Ad2 gets the answer “Check on major diagonal”. 
Capture The referee announces all captures, but he says only on which square 
the capture takes place, and says nothing about the capturing or captured 
piece. In the example, the move Af4 gets the answer “Capture on f4”. 



White could try Ah6. 




Diagram 2. The referee’s board. 




Diagram 1. Possible tries. 
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This list describes the basic messages from the referee. However, in all 

Kriegspiel versions there is a special treatment of positions where captures by 

Pawns are possible (We continue numbering of messages). 

Are there any? In the original set of rules (Eastern rules) a player could ask 
before each move “Are there any ?”, intending “Are there any captures 
by my Pawns?”. The referee answers ‘Wo” if no capture is possible, 
or “Try!” if one or more captures are available. With RAND rules the 
referee announces before each move all possible pawn captures, naming 
the squares where they can take place. In the set of rules which is used 
on the Internet Chess Club (Western rules) the referee announces before 
each move how many pawn captures are available. 

Diagram 3 shows the differences among the different set of rules. 

Eastern rules : The referee says: “White to move”. White can choose to ask 
“Are there any?”; if in the above position White asks the question, the 
referee says “Try!”; White then has to try at least one capture out of three, 
namely ab4, cb3, or cd3. 

RAND rules : before White moves the referee says to both players “possible 
pawn capture on b4 ”. White is not obliged to capture. 

Western rules, ICC : before White moves the referee says to both players 
“possible one pawn capture ”. White is not obliged to capture. 




Diagram 3. Different pawn capture 
rules. 



If now White moves his Pawn to c4, 

Eastern rules : the referee announces: 

“Black moves ”; 

RAND rules : the referee announces: “pos- 
sible pawn captures on a3 and c3”\ 
Western rules, ICC : the referee an- 
nounces: “possible two pawn cap- 

tures ”. 

We report these differences for complete- 
ness, but we also note that they are not impor- 
tant for endings without Pawns. More impor- 
tant when dealing with endings is the fact in 
the original form of Kriegspiel no 50-move 
rule is included; instead on ICC the 50-move 



rule is enforced. 

As a final remark, we note that there are several other forms of Kriegspiel- 
like games, like Dark Chess, Invisible Chess, Stealth Chess, and others. They 
are all based on some form of invisibility. We plan to report the features of this 
family of games in a future paper. 
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3. KR vs K in Kriegspiel 

The ending KR vs K in chess is won in at most 16 moves starting from 
any position. This chess ending is quite easy to study by brute force, because 
excluding symmetric positions only 28,000 positions have to be evaluated, as 
shown in Clarke (1977). 

The ending KR vs K in Kriegspiel is also won. However according to H.A. 
Adamson who published some analysis in the magazine Chess Amateur in 1 923 
and 1926, it can take even 40 moves to give checkmate to Black. More recently, 
this ending has been studied by Leoncini and Magari (1980) and Boyce (1981). 
The studies proved that this ending is algorithmically won, i.e., White can force 
mate against any defense, even the most clairvoyant; there are, instead, several 
endings (e.g., KP vs K or KBB vs K) which are only probabilistically won, that 
is Black has a chance to draw (or, equivalently, if the referee suggests Black the 
right move) (cf. Ferguson 1992, 1995; Ciancarini et al., 1997). 

Below we start with developing an algorithm for KRK. Therefore we de- 
fine the notion metaposition. A metaposition is a position describing a set of 
positions: this can be done graphically. In our case we have diagrams with 
several black Kings, meaning that its position is uncertain. Subsequently we 
can evaluate how many KRK metapositions we have to deal with. The number 
of metapositions for this ending can be approximated by fixing the position of 
white pieces and considering the number of the ways to choose n BK’s positions 
among the remaining positions. If we assume as a worst case for White Hal 
and ^bl, we have 52 possible positions which are not controlled by White. 
The possible metapositions are then 





Diagram 4 . White moves and wins 
(Adamson, 1923). 



For these positions, the reflections of the 
BK position with respect to the diagonal al 
to h8, as described in Bain (1994), do not de- 
crease the numerical complexity of the prob- 
lem. So we are not able to study this ending 
completely by brute force. 

Diagram 4 shows a typical ending. This 
diagram shows a metaposition : the double 
black King means that White is not sure 
whether the black King is on a8 or on b8. 
Alas, he has to find the best (most rapid) route 
to checkmate. 
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abcdefgh 

Diagram 5. White moves and wins 
(Adamson 1923). 



White tries l.^c7: (1) if the referee says 
“No” then White tries 1.^6 or l.Bc2 then 
mate; 

(2) if the referee says “Yes” then 2. B al #. 
In Diagram 5, White tries l.^c7: 

(1) if the referee says “Yes” then the BK is 
on a8 and . . . ^?a7 2. Bd6^a8 3.Ba6#. 

(2) if the referee says “No” then White 
plays 1 . B c7 and: 

(2a) with a silent referee White identifies 
the black King on a8 and the mate is very 
simple; 

(2b) if the referee says “check” then 
2.^d7: 

(2b 1) if “No” the BK is on d8 then 
2. B c 1 ^e8 3. 1 f 1 ^d8 4. S f8#; 

(2b2) if “Yes” Black played l...^b8 
then 2.^d7^a8 3.&c6&b8 4.^b6^a8 
5.Bc8#. 

A general algorithm for any position, in 
which White knows nothing about the BK 
whereabouts, is given in Leoncini and Ma- 
gari (1980). The procedure includes several 
phases. 

In the first phase White has to configure 
his own pieces as in Diagram 6. 

The second phase consists of looking 
for the BK by moves like ^d2, Be2,^d3, 
Be3,..., ^d8,Be8: 

if the referee never says “check” then the 
BK is in the left-hand halfboard, otherwise 
when a check occurs the BK is in the right- 
hand halfboard, and White’s task will be eas- 
ier to fulfil. We assume the first hypothesis 
in the metaposition shown in Diagram 7. 
Interestingly, Kriegspiel metapositions 
Diagram 7. The second phase. have b een compared by Magari to probabil- 
ity waves as in Quantum Physics. According 
to such a metaphor, the black King is not a body with a precise position, but 
a wave, or a set of possibilities. The white King has to destroy such a wave 
entering it and reducing the freedom of the black King. 




abcdefgh 
Diagram 6. The first phase. 
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Diagram 8. How white mates. 



In the final position of Diagram 8 White mates with 2 a8#. 

If at any time the referee says “Illegal move”, White will find the BK earlier, 
and will be able to use his Rook to restrict further the space available to Black. 

3.1 Exploiting the Referee’s Answers 

In any KRK ending, when White has to try a move, there are three possible 
situations. 

1 The referee’s answer is ‘silent’. This allows us only to update our refer- 
ence board cleaning the squares around the WK and along the WR row 
and column. 

2 The WR can check the BK, in that case the player updates his reference 
board and assumes that the BK possible position is on the WR row or 
column. 

3 A try may be illegal because the WK tries to go in a square which is under 
attack or because the WR is going across an occupied square. 

Assume we are in the situation shown in the leftmost position in Diagram 
9. If White moves 2e3 we distinguish two cases: (1) the referee’s answer is 
‘silent’ (second position) or (2) the referee says a check has occurred (third 
position). If White moves ^d5 two cases can be outlined too, with a ‘silent’ 
answer or with an ‘illegal’ answer. In the rightmost position we show the result 
of obtaining the answer ‘illegal’, since the case in which we get a silent answer 
is similar to the Rook’s one. 
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Diagram 9. Analysis of a referee’s answers. 

4. The Search Algorithm 

When drawing out the search algorithm we are first led to a problem caused 
by the fact that the move is described only by the referee’s answer. This implies 
that the evaluation of a move can be made only with respect to the referee’s 
answers, using some probabilistic reasoning. 

Considering for example a situation where the WK is on c2, the white Rook 
is on f2, and Black’s positions traced in the white player’s reference board are 
on al, a2, a3, or e3 with a likelihood of 1/4 each. If the WK moves to bl 
and receives an ‘Illegal’ answer that move will be a good move, decreasing 
the uncertainty (leftmost metaposition in Diagram 10), but if the move receives 
a ‘silent’ answer he will achieve a state of danger, where the WR risks to be 
captured (rightmost position in Diagram 10). So White should not play such a 
move. 





Diagram 10. Analysis of metapositions. 

Our solution consists of making a first evaluation during the generation of 
the pseudo-legal moves considering both cases, either ‘illegal’ or ‘check’ and 
‘silent’ answers, and inserting in the possible moves vector the one with the 
lowest value. In other words the player assumes the worst case and makes 
available to the search algorithm only one answer per move. In this manner 
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the number of moves we are handling becomes more similar to that of classic 
chess. 

In Kriegspiel the player is in the dark about his opponent’s position so a 
minmax-like search cannot be executed in this context, unless we find how to 
represent all possible BK moves. Simply adding a new layer to the algorithm 
and calculating for each White’s legal move all possible BK positions and for 
each of those positions repeat the procedure in a minmax way, has an exponential 
cost that forces us to choose some alternatives. 

The way we have chosen to represent the invisible BK on White’s reference 
board is to define a metaposition which is a set of possible BK squares with 
the same likelihood. Also, we define an uncertainty index as the count of the 
possible positions of a metaposition, as in Sakuta and Iida (2000). In some 
sense White has to play against an unspecified number of black Kings, that can 
move simultaneously. It is quite simple to define a metamove as a move from 
one metaposition to another metaposition. Playing a metamove corresponds to 
playing all the moves for each black position of the metaposition. This trick 
allows us to use an algorithm like minmax or similar, where we use a metamove 
generator. We represent a metaposition as an array of possible positions. 

One distinctive aspect to note is that we are changing the meaning of search 
depth. It now refers only to White’s branching factor, since the generation of a 
metaposition from another involves the introduction of a single edge. Diagram 
1 1 describes the state reached from a reference board where the BK is assumed 
to be on g2 or g5 with a likelihood of 1/2 each. 




abcdefgh abcdef gh 



Diagram 11. Representing metapositions with likelihood. 

Figure 1 shows the pseudo-code describing the search algorithm. 

The algorithm generates all legal white moves and for each resulting position 
it evaluates both possible referee answers using an evaluation function we will 
discuss later. So, for each possible position, it is able to distinguish between 
‘check’ or ‘illegal’ and ‘silent’ answers and it marks the move with the worst 
case according to the value returned by the evaluation function. If it has reached 
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Search Algorithm (int depth) { 

generate the white’s legal moves T; 

for each moves j G T 

{ 

if (rook plays the move j) 
j.value=Min(evaluate(j, check), evaluate^', silent)) 
if(king plays the move j) 

j.value=Min(evaluate(j, illegal), evaluate^', silent)) 



for each moves j e T 

{ 

if ( depthl — 1) { 
makemove(j); 

generate the opponent’s metamove; 
if( !CheckHash(depth— 1 , lvalue)) 
j . value += Search(depth-l); 
else 

j . value += value; 
unmakemoveO; 

} 

if (j . value > max) 
max=j.\ alue; 



RecordHash(depth,max) ; 
return max ; 

} 



Figure L The search algorithm. 



the desired search depth it simply returns the max move’s value, otherwise it 
plays each move and in each metaposition obtained it makes the metamove, 
then it decrements the depth of search and it recursively calls itself; after that, it 
retracts the move played and adds to the move’s value the vote which is returned 
by the recursive call. Finally, it updates the max on that particular search depth. 

A move’s value is modified during the path that the algorithm is analysing. If 
we did not make such updates, a move would obtain a good vote even crossing 
bad states, where, as an example, we run the risk of losing the Rook. Figure 2 
shows the search tree which describes a hypothetical visit. The first evaluation 
is on the right of the node and the updated value of the move is on the left; the 
bold type indicates the best move. 
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Figure 2. A depth-2 search tree. 



If we did not add the static evaluation value to the recursive value, at the 
first depth, moves would respectively obtain —524, 313 and 272; so the second 
move (which has a bad static value) would be chosen by the search algorithm, 
while the third move (which has the greatest static value) would be discarded. 

5. The Evaluation Function 

We will implement the evaluation knowledge using a weighted linear func- 
tion, as follows: 



Evaluate(S) = a h(S) + c 2 f 2 (S) + ... + c 5 / 5 (S) (2) 



where ci, c 2 , .., C 5 are constants and /i(S'), .., fs(S) are functions which set 
up the heuristic evaluation. 

The first aspect we want to make sure of is to avoid having a position where 
the WR risks to be captured. For this reason the first boolean function fi(S) 
evaluates the possibility that the Rook is under attack, in that case it returns 
FALSE. 
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Once we are certain that the Rook is safe, 
we try to bring the two Kings closer. That 
means to let the WK patrol the board. Thus 
the second function f 2 (S) estimates the dis- 
tance between the WK and all the possible 
BK positions, by considering the furthest one. 
The way we calculate the distance is the sum 
of columns and rows between the WK and 
the furthest BK. In Diagram 12 we show an 
example where this distance is 10. 

Let us assume that it is White’s turn to 
Diagram 12. Computing the distance move, so the BK certainly is on one of those 
WK-BK. quadrilateral regions with which the white 

Rook divides the board. The aim of the white player is therefore to reduce 
all the regions’ areas that contain the black Kings. Again the uncertainty about 
BK’s real position is a problem. The third function fz(S) estimates which one 
of the four regions holds the BK and tries to reduce its area. We define it as 

fs(S) = EvalArea(S) = c • (a\ + a>2 + a 3 + a^) (3) 

where c G {1, 2, 3, 4} is the value which traces the number of quadrilaterals 
that possibly contain the opponent’s King, and ai{i = 1,..4) represents the 
number BK’s possible positions in each quadrilateral. As shown in Diagram 
13, in the worst case where uncertainty is maximal, the function’s result is 180. 

The fourth function f±(S) is a boolean 
function which evaluates whether the WR is 
on the squares around the WK, in that case it 
increases by one the move’s value. 

The fifth function fs(S) considers good 
moves those that push the BK toward the 
board’s comer. For each positions, where 
the BK might be, fs(S) adds to the move’s 
value the correspondent value from the ma- 
trix, shown in the Figure 3. 

It is useful to note that fs(S) function cal- 
culates a positive value, but in order to eval- 
uate the best move we have to minimize this 
value. 

The same remark on the others functions leads us to the following evaluation 
function: 
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Diagram 13. Computing A (5). 




Evaluate^ ) = -420 + 840 • h(S) - f 2 (S) - f 3 (S) + f 4 (S) + f 5 (S) 
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/ 1 1 0 0 0 0 1 1 \ 

1 0 0 0 0 0 0 1 

0 0 -2 -4 -4 -2 0 0 

0 0 -4 -4 -4 -4 0 0 

0 0 -4 -4 -4 -4 0 0 

0 0 -2 -4 -4 -2 0 0 

1 0 0 0 0 0 0 1 

V 1 1 0 0 0 0 1 1 

Figure 3. The simple numerical matrix used by /s ( S ). 



where c\ — 840 is a weight that gives fi(S) top priority. 

We finally add, after the search algorithm, a function, which catches check- 
mate cases and consequently avoids playing moves to stalemate states. 

6. The Transposition Table 

Since during the search algorithm we would cross states of the board previ- 
ously analysed, it is interesting to avoid to analysing them a second time. As we 
have seen the number of metapositions is extremely large and it is impossible 
to maintain each of them in memory. A natural solution to the comparison be- 
tween the states involves creation of a signature value, typically using Zobrist 
(1970) keys. 

We define a three-dimensional vector indexed on {KNIGHT, ROOK}, 
{WHITE, BLACK}, and on the number of squares; then we fill each element 
with a random 64-bit number. To create a Zobrist key for a metaposition, we 
set it to zero, then for each piece on the board we add it into the key via the 
XOR operator. The pieces can be either the Kings or the Rook, and the black 
King may appear several times. 

This technique has the advantage of creating good hash keys, that are not 
related to the metaposition being keyed. If a single piece is moved, we obtain a 
value that is completely different. So, these keys do not collide often. Another 
good peculiarity is that we can manage Zobrist keys incrementally, improving 
the artificial player’s performance, as described by Moreland (2002). 

We use the Zobrist keys to implement a transposition table, which is a large 
hash table that allows us to trace metapositions that we have met during the 
search. It is impossible to create a big data structure that includes all the 
metapositions, but in the event of collisions, i.e., when two states are mapped 
on the same vector’s element, we use the Zobrist keys to identify the correct 
one. 
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CheckHash (int depth, int * value) { 

hash_element i hashpt = &table[(WRB.key % MHE)+MHE]; 
if(hashpt-> key == WRB.key) 
if(hashpt-> depth >=depth) { 
value=hashpt- > value ; 
return TRUE; 

} 

return FALSE; 

} 

RecordHash (int depth, int max) { 

hash_element *hashpt = & table [(WRB.key % MHE)+MHE]; 
hashpt-> key - WRB.key; 
if(hashpt==NULL) hashpt-> value = value; 
else if(hashpt-> value >value) 
hashpt-> value = value; 
hashpt-> depth = depth; 

} 

Figure 4. Updating the hash table. WRB means White’s Reference Board and MHE is the 
Max number of the Hash Elements into the table. 

In the Figure 1 we used two functions whose pseudo-code is shown in Figure 
4. These two functions are used to store the elements into the transposition table 
and to load them from it. 

The CheckHash function does the load operation. If the element previously 
stored is the one we have to analyse and it has been examined with a depth 
grater or equal to the required depth, then the element is loaded from the table. 

The RecordHash function does the store operation. It inserts the key and the 
search depth into the table. When it is not saving a new element, it inserts the 
value only if this value is smaller than the previous one. That means that the 
metapositions are randomly divided into clusters. 

7. Tests 

We have executed a first test on 26,536 initial positions, randomly selected 
from the 175,168 legal positions of KRK endgame. Each initial position has 
the maximum uncertainty on White’s reference board, meaning that the BK has 
the maximum freedom in terms of possible squares. Black’s strategy always 
consists in playing the move that allows him to go away from the edge of the 
board. 
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Figure 5. Won games and number of moves (first test). 



Result 


number of games 


mate 


25372 


loop 


883 


draw 


279 


stalemate 


2 



This test shows that 95.6% of the games are 
won by White, while 4.4% is lost. In particu- 
lar 75.9% of this percentage refers to a game 
that has been stopped for a loop, 24% is draw, 
and 0.1% is a stalemate, as shown in Table 
1. The average number of moves needed to 
give mate is 36, and the worst game played 
has been 117 moves long. 

In the histogram of Figure 5 we show the number of the matches won per 
moves needed, with intervals of 5. 



Table 1. The 26536 games’ result, 
during the first test. 



In order to have a comparison, we exe- 
cuted a second test on all initial positions us- 
ing the referee’s point of view, namely we 
play this ending using ordinary chess rules and 
our Kriegspiel evaluation function. During a 
match, if the game either begins at or goes 
across some positions previously played, the 

Table2 The games’ result during the re f e ree Stops it and considers it won or lost, 
second test. A 

depending on the result of previous games. 

In this test 99.5% of the games is won, which is not bad but it shows that our 

evaluation function is not perfect for ordinary chess. We show the entire results 

in Table 2 and in Figure 6 we show the sets of won games and the number of 

moves needed during the second test. 



Result 


number of games 


mate 


18469 


loop 


2 


draw 


122 


stalemate 


0 
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Figure 6. Won games and number of moves (second test). 



Below we show how our program deals with the position of Diagram 5. If 
we assume that the BK is on a8, the program plays efficiently (in the scores, I 
means that the referee says illegal, C means check)'. 1. d6c7 a8b7{I} a8a7 2. 
d7d6 a7b6{I} a7b7{I} a7a6{I} a7a8 3. d6a6{C} 1-0 {White mates} 

If we assume that the BK is on c8, the program actions are also very effective, 
as it achieves the checkmate in 2 moves: 1. d6c7{I} d7f7 c8d7{I} c8c7{I} 
c8b7{I} c8d8 2. f7f8{C} 1-0 {White mates} 

Let us assume that the black King is on b8. After playing ©c7 and receiving 
an ‘illegal’ answer, the program plays less precisely 2f7, and then it takes 23 
moves to mate: 1. d6c7{I} d7f7 b8c7{I} b8b7{I} b8a7{I} b8c8 2. f7f8{C} 
c8d7{I} c8c7{I} c8b7 3. d6c7{I} f8g8 b7c6{I} b7b6 4. d6c6{I} g8g5 b6c5{I} 
b6c6{I} b6b5{I} b6b7 5. d6c7{I} d6d7 b7c6{I} b7b6 6. d7c6{I} d7d6 b6c5{I} 
b6c6{I} b6b5{I} b6b7 7. d6c6{I} g5c5 b7c6{I} b7b6 8. d6c7{I} d6c6{I} d6d5 
b6c5{I} b6c6{I} b6b5{I} b6b7 9. d5c6{I} d5d6 b7c6{I} b7b6 10. d6c6{I} 
d6d5 b6c5{I} b6c6{I} b6b5{I} b6b7 11. d5d6 b7c6{I} b7b6 12. d6c6{I} 
c5c7 b6c5{I} b6c6{I} b6b5 13. d6c5{I} c7c6 b5c4{I} b5c5{I} b5c6{I} b5b4 
14. d6c5{I} d6d5 b4c3{I} b4c4{I} b4c5{I} b4b3 15. d5c4{I} c6d6 b3c3 
16. d5c4{I} d5c5 c3d4{I} c3d3{I} c3c4{I} c3d2{I} c3c2 17. c5b4 c2c3{I} 
c2d3{I} c2d2{I} c2b3{I} c2b2 18. b4c3{I} b4a3{I} d6c6 b2c3{I} b2c2{I} 
b2b3{I} b2cl{I} b2bl 19. b4b3 blb2{I} blc2{I} blcl{I} bla2{I} blal 20. 
c6c4alb2{I} albl 21. c4a4blb2{I} blc2{I} blcl 22. a4d4clb2{I} clc2{I} 
cld2{I} cldl {1} clbl 23. d4dl{C} 1-0 {White mates} 
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8. Future Work and Conclusions 

In this paper we have described a program which plays a Kriegspiel endgame. 
We started from a normal chess program and modified it to deal with the un- 
certainty typical for Kriegspiel playing. In order to evaluate our player, we 
have played several thousands of games showing that the evaluation function 
developed is a good basis for further refinements. 

We could have implemented a rule-based player based on the procedures 
reported in Leoncini and Magari (1980) and Boyce (1981). A first problem is 
that these papers do not prove that their procedures are correct and complete. 
So, we have no guarantee to obtain a program playing perfectly the KR vs K 
ending. Moreover, any rule-based solution would have been specialized in KR 
vs K only. Instead we have adapted our player rather easily to another ending, 
namely KQ vs K, and now we plan to make similar experiments for other basic 
endings such as KBBK, KBNK, etc. 
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Abstract We present a new method for taking advantage of the relative independence 
between parts of a single-player game. We describe an implementation for im- 
proving the search in a solitaire card game called Gaps. Considering the basic 
techniques, we show that a simple variant of Gaps can be solved by a straight- 
forward depth-first search (DFS); turning to variants with a larger search space, 
we give an approximation of the winning chances using iterative sampling. Our 
new method was designed to make a complete search; it improves on DFS by 
grouping several positions in a block, and searching only on the boundaries of the 
blocks. A block is defined as a product of independent sequences. We describe 
precisely how to detect interactions between sequences and how to deal with 
them. The resulting algorithm may run ten times faster than DFS, depending on 
the degree of independence between the subgames. 

Keywords: depth-first search, dependency-based search, block search, Gaps 

1. Introduction 

In this paper we consider a solitaire card game usually called Gaps , Montana, 
Rangoon or Blue Moon. We give approximations of winning chances for the 
game of Gaps and use the domain for testing new ideas. In the field of solitaire 
card games, we may also mention the game Freecell which has become a test 
domain in planning (Hoffmann, 2001). 

We have reasons to believe that techniques based on heuristics are not very 
useful in Gaps. However we have been able to improve the search in another 
way, by proving the independence between moves in different parts of the game 
and making use of it. A few search techniques with similar concerns exist but 
they are based on different principles (Allis, 1994; Junghanns and Schaeffer, 
2001; Botea, Muller, and Schaeffer, 2002). 
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This paper is organised as follows. We explain the rules of Gaps in Section 2. 
We give results of the basic techniques and explain the reasons for our approach 
in Section 3. We present our method in Section 4 and experimental results in 
Section 5. Finally, in Section 6 we discuss possibilities for generalization and 
compare our method to the existing one which is the closest: dependency-based 
search (Allis, 1994). 

After our experiments, it was a surprise to find that a game called Superpuzz , 
studied by Berliner and more recently by Shintani, is a variant of Gaps. How- 
ever, at the time this paper was written, we did not know precisely what work has 
been done on Superpuzz. We have found a short description of Berliner ’s ( 1 997) 
work on the web, and Shintani’s (1999, 2000) work has only been published in 
Japanese. 

2. Rules of the Game 

Below we explain the rules of what we call the basic variant (2.1). Then we 
describe a few other variants (2.2). Finally, we give some basic properties of 
the game (2.3). 

2.1 Basic Variant 

The game is usually played with a 52-card deck. The cards are placed in 
4 rows of 13 cards each. The 4 Aces are removed, resulting in 4 gaps in the 
position; then they are placed in a new column at the left in a fixed order ( e.g ., 
1st row Spade, 2nd Heart, 3rd Diamond, 4th Club). The goal is to create ordered 
sequences of the same suit, from Ace to King, in each row. A move consists in 
moving a card to a gap, thus moving the gap to where that card was. A gap can 
be filled only with the successor of the card on the left (that is, the card of the 
same colour and one higher in value), provided that there is no gap on the left 
and that the card on the left is not a King, in which case we can place no card 
in that gap. Figure 1 shows an initial position with only 4 cards per suit, before 
and after moving the Aces, and the possible moves. 



40 3 * 4 * 40 
2 * 4 20 2 * 
20 34k 10 1* 
30 30 10 



14k 40 n 34 4* 40 ' n 
10 24 k) 44 20 24 ') 
10 20\34k'A 
1* 30 A 39 ^ 



Figure 1. An initial position with 4x4 cards, before and after moving the Aces. This position 
can be won. 
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2.2 Other Variants 

The basic variant presented above is probably not the most common. Usually 
the Aces are not placed in a new column but are definitively removed. Instead 
it is allowed to place any Two in a gap if it is on the first column. This gives 
more possibilities than in the basic variant where only one Two can go in each 
place of the second column (which was the first before moving the Aces). This 
difference is not a minor one, as it has a strong influence both on the size of the 
search space and on the probability of a winning game, as we will see. We call 
this variant the common one. The reason for our choice of the basic variant was 
to make the rules cleaner; this way only one card can be placed in any gap. 

The game is usually played with an additional rule which says that when the 
player gets stuck, he may remove the cards that are not part of an increasing 
sequence starting from the first column, and redeal them. Two redeals are 
allowed. We have not studied the game with this rule. However, as we will see, 
the probability of winning without this rule but with perfect play is higher than 
that obtained by human players using this rule. 

It is possible to change the number of suits and the number of cards per suit. 
This influences the size of the search space and the problem’s difficulty. It also 
has an effect on the degree of independence between subgames, which will be 
a major concern. 

The game that has been studied under the name Superpuzz is what we have 
called the common variant. There is only one minor difference: the gaps are 
created by removing the Kings instead of the Aces. 

2.3 Properties 

In the basic variant, every initial dealing results in a separate game of perfect 
information. This version has a remarkable property: in any position, the depth 
of the search graph is bounded; in particular there is no cycle. If we look at a 
particular card, of value v, there is only v — 1 places where it could be in the 
subsequent positions, in addition to its present location: it could be one space 
on the right of the card of the same suit and of value v — 1, two spaces on the 
right of the card of the same suit and of value v — 2 (which means that we have 
built a sequence v — 2, v — 1, v from the current position of the card of value 
v — 2 and of the same suit) . . . , and v — 1 spaces on the right of the Ace of the 
same suit. The card cannot go to any of those places twice, so the number of 
moves of this card is bounded by v — 1. Therefore the total number of moves 
with 52 cards is bounded by 4 x (1 + 2 + 3 + . . . + 12) = 312. 
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3. Basic Techniques 

In this section we show results of either a depth-first search or an iterative 
sampling search applied to Gaps, then we discuss possibilities for improvement. 
Although the techniques exposed here are basic, the results are about the best 
we could do without using the method, block search, that we present in the next 
section. 

3.1 Depth-First Search 

It turns out that in the basic variant the search space is small enough to allow 
a complete depth-first search enhanced with a transposition table. Assuming 
that we stop the search as soon as we find a winning path, the average size of the 
search space is about 250,000. It seldom goes above 2 million. This is small 
enough for all positions to be stored in a transposition table. A test on 10,000 
initial positions shows that the probability that an initial position can be won 
is about 24.8%. The length of winning solutions is usually in the range of 90 
to 130 moves. All the computations have been made on an Athlon 1600+ with 
1GB RAM. The previous search takes about 0.2s per problem. 

This is only the beginning of the story though, because the basic variant is 
far from being the most difficult one, and even in the basic variant the difficulty 
could be increased by playing with more cards. 

3.2 Iterative Sampling 

DFS is impractical in variants where the size of the search space is too big. 
Instead, iterative sampling (Harvey and Ginsberg, 1995) has proved to be 
surprisingly efficient. This consists in playing completely random moves until 
a goal is found or the player gets stuck, in which case the search restarts at the 
beginning. This is repeated until a probe is successful or the maximal number 
of probes is reached. 

We give results of this algorithm both for the basic variant and for the common 
one (where the Aces are definitely removed and any Two can go in the first 
column). We consider the common variant because the typical size of its search 
space is too big to allow a complete search in a reasonable time (this property 
could also have been obtained by increasing the number of cards). This way we 
also get a first approximation of the winning chances for the common variant, 
which are unexpectedly high. 

Table 1 shows the results of an experiment on a set of 1000 random initial 
positions. The set of positions is always the same, except that, for accuracy, 
experiments with fewer than 1000 probes have been made on 100,000 initial 
positions. One probe takes about 4.5/iS. This amounts to about 450 s for 10 8 
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probes when they are unsuccessful; on average 425 s and 164 s for the basic 
and common variants, respectively. 

It is a particularity of our 
domain that we have a slight 
chance of winning by doing ran- 
dom moves. The efficiency of 
the algorithm is due to its cov- 
ering a well-distributed part of 
the search space and its avoid- 
ing getting lost in large parts of 
the tree where it is impossible to 
win. 



max number 
of probes 


success rate 
basic variant 


success rate 
common variant 


1 


0.046% 


0.041% 


10 


0.373% 


0.396% 


10 2 


1.37% 


2.96% 


10 4 


7.1% 


26.6% 


10 5 


9.7% 


43.1% 


to 6 


12.8% 


53.0% 


to 7 


14.5% 


60.3% 


10 s 


16.4% 


66.9% 



Table 1. Results of iterative sampling. 

3.3 Combining a Depth-bounded Search with Iterative 
Sampling 

Iterative sampling can be combined with a depth-bounded complete search. 
One possibility is to make a breadth-first search until exhaustion of memory 
resources, and to make one or more random probes at each node of this search. 
The results are better compared to simple iterative sampling, probably because 
it ensures a better distribution of the probes in the search space. Furthermore 
this method will also prove some problems impossible when the search space 
is searched completely. 

We have run a test on 100 random initial positions for the common variant. 
The breadth-first search was limited by the number of positions that could be 
stored in memory: this number was set to 5, 000, 000. One random sample was 
performed at each node. The program took 144 s per problem in average, and 
it has found solutions for 88 of the initial positions and proved 4 impossible. 

3.4 Comparison with Human Performance 

Estimations of the chances of winning for human players are based on various 
sources from the web and on personal experience. The chances of winning when 
no redeal is allowed are of about 1%. The exact rule concerning the gaps in 
the first column apparently has little effect on the difficulty of the game for 
human players, but we have shown it is important for the computer. The last 
feature must be compared with 24.8% (basic variant, complete DFS), 66.9% 
(common variant, iterative sampling) and 88% (common variant, combination 
of breadth-first search and iterative sampling). 

With two redeals allowed, chances of winning for human players are of about 
25%, still well below 88%. 
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3.5 Discussion on Possibilities for Improvement 

Iterative sampling is a simple and efficient technique for the game of Gaps; 
however, in the process of designing a search program for solving Gaps, this 
took us quite some time to realize. Previously, our attempts for solving Gaps and 
its variants were based on complex best-first search algorithms. We gradually 
realized that it was important to try and go to the goal often without caring 
much for the quality of the moves. At that time our algorithms were about as 
follows: do some tree search in a best-first way, and during this search, from 
time to time, launch random samplings. We finally found that the strength of 
our program was almost entirely due to the random samplings. 

At the beginning, we had been working on yet another variant of the game. 
This variant differs from the common one by the rule that we may move in a 
gap not only the successor of the card on the left but also the predecessor of 
the card on the right. We felt that there were much more efficient heuristics in 
this variant. We did get some successes using heuristics, but even there random 
sampling alone would do about as well. 

In the basic variant the situation is worse. What heuristics do we have? 
First, there is the number of cards that are already in their final location. This 
is the only simple heuristic we know about, but unfortunately it gives a poor 
evaluation of the position, as it often happens that most cards only get in their 
correct location in the endgame. Then there could be heuristics concerning the 
mobility of the cards, in the present and in the long term, but this is difficult to 
estimate. 

It is possible that good heuristics could be found. However a comparison 
with human performance shows that we are not so bad with iterative sampling. 
One can see that one sample by human players is roughly as successful as 100 
random samples. This indicates that human players do not use very efficient 
heuristics anyway. Even if we could do as well as humans on one sample, 
considering the time that would be needed for computing heuristics it would 
probably not be interesting. Because heuristics are weak, any best-first search 
algorithm would also be of limited use. As an example there is the well-known 
IDA*; we have experimented with it but did not achieve better results than with 
a depth-first search. 

The next part of this paper takes an orthogonal approach to the heuristic one. 
Our goal is to go through the search space completely, without even caring 
whether we find a winning sequence. We want to do it faster than a depth-first 
search would, by simplifying the search space. Thanks to this, we will be able 
to determine for sure if there is a solution in some problems where depth-first 
search is not applicable; for instance in the basic variant when playing with 
more cards. 
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4. Block Search 

In this section we present a new method that aims at proving the independence 
between some parts of the problem and taking advantage of it, while keeping 
the search complete. 

From now on the focus will be on the basic variant only, because we will 
make use of the fact that in any position only one card can be placed in a 
gap. The common variant does not have this property, and therefore more work 
would be needed in order to generalize the method to this variant. 

4.1 Related Work 

Among existing search algorithms, the closest to ours is probably the algo- 
rithm dependency-based search by Victor Allis (1994). He applied his method 
to three domains: the double letter problem, and the search of winning se- 
quences in Qubic and GoMoku (the last two, being 2-person games, were first 
transformed into single agent games). In fact the starting point of our work 
was a failure to adapt this algorithm to the game of Gaps. A pseudo-code for 
the algorithm was given, but a function called NotlnConflict was not explicit; 
we believe that this function was easy to write in the domains where the algo- 
rithm had been implemented but would be difficult to write in Gaps, at least not 
statically. 

The goal of our method is also similar to that of Junghanns’ Relevance Cuts 
for Sokoban (Junghanns and Schaeffer, 2001). He suggested that relevance can 
be approximated by computing an influence between moves, and then penaliz- 
ing moves that are not relevant to the previous ones. His work was done in the 
context of an IDA* search, so in his method moves are never definitely elimi- 
nated, they may only get a penalty. The method we have developed handles the 
problem more precisely. 

A more recent work on Sokoban (Botea, Muller, and Schaeffer, 2002) ad- 
dresses the problem in yet another way, by decomposing the position in rooms 
and precomputing the graph of states in each room. The major difference with 
our work is that the states in the subgames are precomputed, and this does not 
seem possible in Gaps. 

4.2 Principle of the Method 

We name the four gaps A, B , C, D, and break the game into four subgames 
also named A, B , C, D. The moves allowed in one subgame are those that use 
the corresponding gap. If one plays only in one subgame, one makes a linear 
sequence of moves. This sequence moves the same gap from place to place until 
getting stuck, which can happen for any of the following two reasons: either 
there is another gap on the left, or the card on the left is a King. Whereas the 
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moves in the same subgame are totally ordered, moves from different subgames 
are often independent. We want to take advantage of this relative independence 
between the subgames. 

Let a block be defined by its starting position, and in each subgame X 
a sequence Sx , possibly empty, from the starting position, so that the four 
sequences are independent of each other. A block represents a set of positions: 
all positions that can be reached from the starting position of the block with the 
moves of the sequences Sx in any order. It helps to see a block geometrically 
as embedded in a four dimensional space. If lx is the length of sequence Sx, 
the block represents Ia x ^b x ^c x ^d positions. Finally, we call Fx the face 
of the block that consists in the positions of the block where all the moves of 
sequences Sx have been made. 

The main idea of the algorithm is: instead of searching one position at a time, 
we search one block at a time. Instead of recursively searching the immediate 
children of a position, we construct new blocks at the boundary of a block and 
recursively search them. 

We want to construct blocks of the biggest possible size, so before building 
blocks on the boundary, we try to extend them as much as possible in the four 
subgames. Figure 2 shows a pseudo-code for the algorithm. 

void search(block) { 

for each subgame X 

extend block in the subgame X , as long as 
all the sequences keep being independent; 

for each subgame X 

build new blocks near the face Fx of the block, 
such that any move we can do from Fx goes to 
one of those blocks, and search them recursively; 

test for a winning position in the block; 

} 

Figure 2. Pseudo-code for block search. 

We still have to show how to extend blocks and construct new blocks at the 
boundaries. Besides, the pseudo-code does not include a transposition table, 
and this will lead to some problems to be addressed in Subsection 4.7. 

4.3 Study of the Basic Interaction 

We study in detail the case of a single interaction between two sequences. 
Figure 3 shows the useful part of the position and a diagram which synthesizes 
the relation between the two sequences. For simplicity all cards are of the same 
suit. We assume that both sequences begin a few moves before the interaction 
and end a few moves after, although the moves that are not critical have not been 
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drawn. The dotted arrow in the right diagram indicates the action of sequence 
Sb on sequence Sa- Assume we are just after move a± in Sa- If &i has not yet 
been made, move <22 can be made in subgame A; if move 62 has already been 
made, move a' 2 can be made in subgame A; if move b\ has been made but not 
move 62 , no move can be made in subgame A. 



Gap A Subgame A Subgame B 




Gap B 6 



Figure 3. A basic interaction. 



Figure 4 shows a representation in 
the plane of the search space, where the 
positions lie at the intersections of the 
lines. Imagine there was no interaction 
between the two sequences; we would 
have a big square with the entire se- 
quences A and B on the edges. The 
effect of the interaction is to cut this 
square along a line from the point p to 
the right side (the double line in the fig- 
ure), and to stick another part along the 
cut, which corresponds to sequence Sa 
taking the bifurcation. The position at 
p is particular: it is the position where the two gaps are adjacent, so that no 
move can be made in subgame A. This point corresponds to the dotted arrow 
in Figure 3. We say that there is a bifurcation of sequence Sa, caused by an 
action of sequence Sb- One must imagine that there are two other dimensions 
corresponding to the subgames C and D\ if the sequences in these subgames 
do not introduce new interactions, the complete search space will be a simple 
product of the graph in Figure 3 with the sequences Sc and S&. 

We are looking for ways to partition the search space into blocks. There 
are several ways to do it; Figure 5 shows the two ways we will use. They 
deal with the two possible shapes of the first block. Whether we get in the 
first or in the second depends on the order in which we have extended the first 
block: first in subgame A or B. Subsection 4.6 will explain precisely how to 
detect interactions when extending blocks and how to build new blocks at the 
boundary. For the moment, we note that in the first possibility blocks 2 and 3 




Figure 4. Search space corresponding to a 
basic interaction. 
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are children of block 1, whereas in the second only blocks 2 and 4 are children 
of block 1, and block 3 is a child of block 2. 




subgame B subgame B 



Figure 5. Two ways to decompose the search space into blocks. 

4.4 Why the Basic Interaction is the Only One to Consider 

We have shown how to deal with the basic interaction; it turns out that it 
is the only one we have to consider. Let Sx and Sy be two sequences in the 
subgames X and Y. Let us enumerate all the ways that a move y of sequence 
Sy could be influencing a move x of sequence Sx- Move x consists in taking 
the card c from place pi to place p 2 . The prerequisites for this move are: 

1 there is a gap at p 2 ? 

2 the card c is in pi , 

3 the card at the left of p 2 , ci, is the predecessor of c. 

Those preconditions are verified if we make only moves from sequence Sx up 
to x , but they could be destroyed by moves of sequence Sy. Assume we have 
already established the independence of the sequences Sx and Sy up to the 
moves x and y\ then precondition 1 is automatically satisfied as soon as we have 
executed all the moves of sequence Sx up to x and whatever we have done in 
the sequence Sy up to y, since it is an effect of the beginning of sequence Sx 
to put a gap in p\ . 

Precondition 2 is automatically satisfied too. This comes from the fact that 
the card c can only go to the right of cl, wherever this card be. If this card was 
moved by sequence Sy , then there would already be an interaction because of 
precondition 3. If move y moved card c to the right of cl and this card was still 
at the left of P 2 , the trajectories of the gaps X and Y in the sequences Sx and 
Sy up to x and y would both pass through p 2 , which again would imply that 
they are dependent. 

Therefore only precondition 3 remains to consider, which produces an inter- 
action of the kind already analysed. 
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4.5 Using a Trace to Speed Up the Discovery of the 
Interactions 

The trace of the sequences Sx is an array of the same size as the position 
(4x13). It contains information for each place p indicating whether and at which 
move the trajectory of the gap for any sequence is passing through p. The trace 
is maintained incrementally as we build new blocks. 

Assume we make a new move m, which moves a card from p\ to p 2 - We 
want to know if it produces new interactions with the sequences already built. 
A look in the trace at the place just on the left of p 2 shows if any sequence has 
an effect on move m. A look in the trace at the place just on the right of p\ 
shows if move m has an effect on any sequence. This way the search for an 
interaction is done very quickly. 

The method of doing a local search and storing the set of properties of 
the position on which the result relies has already been used by other people 
in different domains: in Go, with the goal of incrementally updating local 
results (Bouzy, 1997); in Generalized Threats Search (Cazenave, 2002) which 
is a 2-players selective search algorithm that relies on a trace to find a set 
of relevant moves; in the algorithm H-search used in the hex program Hexy 
(Anshelevich, 2002) with a bottom-up approach, building increasingly complex 
virtual connections. 

4.6 Building and Extending Blocks 

The procedure for building blocks at the boundary is tricky, because we have 
to take into account all the interactions that might occur. Although there is 
only one kind of interaction that needs to be considered, it can come in the 
two different configurations shown in Figure 5, and we must be prepared that 
several configurations occur at a time. Figure 6 represents a search space in two 
dimensions that gives an idea of the kind of situations we have to deal with. 

We are on face Fb of block b. We try to find out what moves can be made in 
subgame B depending on the exact location on Fb, and if the moves have an 
action on the other sequences. This situation occurs twice in the program: first 
when we are trying to extend the block in subgame B (which can be done as 
long as there is no interaction), second when we are building new blocks near 
face Fb (generally because we have already found an interaction). We must 
answer the following questions in this order. 



1 Is there an action of any other sequence of the block that will cause a 
bifurcation on S#? This is the case if and only if the trajectory of the 
gap for any other sequence is passing through the place at the left of gap 
B. This can be decided quickly by looking at the trace. An example is 
interaction 1 in Figure 6. 
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interaction 1 



block b 










Figure 6. Several interactions on one face of a block. 



2 Now we know that the move to be made in subgame B does not depend 
on the location on face Fb, or we have already restricted ourselves to 
an area where it is the case. Is a move possible at all? There cannot 
be another gap on the left of gap B because of the previous step, so the 
question is whether the card on the left is a King (in this case the block 
cannot be extended, and no block will be constructed on this part of the 
boundary). 

3 Now we know we can do a move; this move takes a card from a place 
p and moves it in gap B. Is there an action of this move on the other 
sequences? This is the case if and only if the trajectory of the gap for any 
other sequence is passing through the place at the right of p. This too can 
be decided quickly by looking at the trace. An example is interaction 2 
in Figure 6. 



When building new blocks at the boundary, one must go through these three 
steps. In steps 1 and 3 we may have to break face Fb into two parts (in some 
degenerated cases there may be one or zero part) and apply the following steps 
to each. After step 3 we have isolated a part G of the face Fx- We know that 
we can make a move m anywhere on G and that this move has no effect on the 
other sequences. We then create a new block by doing move m from G , and 
search it recursively. 

A block c that has just been built on face Fx of a block b has no depth in 
subgame X. When we try to extend block c in all the subgames, it is generally 
successful for subgame Sx\ on the contrary, it is generally not successful in 
the other subgames because the reasons why block b had been stopped in those 
subgames often stand for block c. 
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4.7 Adding a Transposition Table 

The decomposition in blocks already handles all the transpositions within the 
blocks; this is good but insufficient. In order for the algorithm to be efficient, it 
is almost compulsory to have a global transposition table. However, when we 
construct a new block or when we extend one, there could be common positions 
anywhere in this block and in another already built one. 

Now we do not want to go to all the positions in every block and mark them in 
the transposition table, because the advantages of our method would disappear. 
We have to look for a compromise: we could mark only a well-chosen part of 
each block and hope it will be sufficient to detect most transpositions. 

We define the number of positions of a block search as the sum, for each 
block, of the number of positions contained in this block. Let R be the ratio 
between the number of positions of a block search and the number of positions 
of a depth-first search. Ideally, if the transpositions table is large enough to 
contain all the positions of the search space and if the blocks are mutually 
disjoint, then R = 1. If the blocks are not mutually disjoint, then R will be 
larger; we need to control how much larger it will be. 

A first possibility is to mark only the starting position of each block. An 
experiment on 100 random positions for the basic variant has shown that the 
ratio R is about 3.95, which is too much. 

A second possibility is to mark only the positions that can be reached from the 
initial position of the block by making moves in only one of the four sequences 
of the block simultaneously. Geometrically, those are the points located on the 
four edges of the block starting from the initial position. The ratio R drops down 
to 1.33, which is acceptable although a better compromise probably exists. 

5. Experimental Results 

The method was designed to be complete; we have verified experimentally 
that it is indeed the case. This is a sign that we have correctly analysed all the 
possible interactions that can occur at the boundaries of the blocks. The method 
for verifying the completeness was the following: from an initial position, first 
run a complete depth-first search and store all the positions of the search space; 
then run a block search and verify that all the positions of the search space lie 
in at least one of the blocks. This verification has been done for 1000 initial 
positions. 

Table 2 shows statistics about an experiment on 1000 random initial positions 
for the basic variant. There is a difference in time and number of positions 
compared to Subsection 3.1 because the search is not stopped when we find a 
winning position. Also the transposition table is not implemented in the same 
way: before it could grow as needed, now we use a hashtable of fixed size as 
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is usual in game programming (Breuker, 1998). The hashtable has 64 million 
entries. 



avg. number of positions for DFS 


502,000 


avg. number of blocks 


36,200 


avg. size of blocks 


18.6 


R 


1.34 


avg. time for DFS 


2.28s 


avg. time for block search 


2.04s 



Table 2. Basic variant (4 suits, 13 cards/suit). 



The average size of the blocks 
is 18.6, so one node of block 
search does as much work as 1 8.6 
nodes of DFS in average. As we 
have already mentioned, the to- 
tal number of positions in all the 
blocks is larger than the number 
of positions searched by DFS, by 
about 34%. The final result in speed is a gain of 11% for blocks search. In 
the present case however, the difference in time is not very significant of the 
performance of block search because, for both algorithms, much time is spent 
reinitializing the large transposition table between problems. 

We do not see the power of block search yet. Higher gains in speed can 
be obtained in variants with a larger search space, and with a higher degree 
of independence between the subgames. This can be achieved by increasing 
the number of cards. We therefore turn to 6 suits and 13 cards per suit. This 
increases both the number of cards and the number of subgames. 

It is difficult to give average 
statistics because the size of the 
problems vary a great deal, some 
being too big for DFS and a few 
even for block search. We have 
made 15 experiments with ini- 
tial positions that could be com- 
pletely searched both with blocks 
search and DFS. In Table 3 we 
show detailed statistics for one of them, which is typical. We also show in 
Figure 7 that the gain in speed is correlated to the size of the search space. This 



number of positions for DFS 


289 x 10 b 


number of blocks 


5.00 x 10 6 


size of blocks 


59.7 


R 


1.03 


time for DFS 


437s 


time for block search 


44s 



Table 3. 6 suits, 13 cards/suit, the hashtable has 64 

million entries. 



22 
20 
18 

T3 16 

<D 

8. 14 

a 12 
.9 10 

CD 8 
6 
4 
2 

1e+07 1e+08 1e+09 1e+10 

number of positions for DFS 

Figure 7. Gain in speed of block search over DFS, for 15 initial positions. 
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is promising. Our algorithm clearly has an advantage over a depth-first search 
because it can build large blocks, and this advantage would grow larger if the 
number of suits and/or cards per suit was further increased. 

In the experiment of Table 3, 
the ratio R has dropped down 
from 1.34 to 1.03. This is due 
to collisions in the transposition 
table. This phenomenon is am- 
plified if we decrease the size of 
the transposition table: Table 4 
shows statistics about the same 
problem but with a hashtable that 
can only contain 8M positions. This has a dramatic effect on DFS, but almost 
no effect on our algorithm. Because of collisions, the ratio R is even less than 
1 ! So, now the situation is reversed: it is our algorithm that makes a better use 
of the transposition table. 



number of positions for DFS 


1550 x 10 e 


number of blocks 


5.88 x 10 6 


size of blocks 


61.1 


R 


0.23 


time for DFS 


1730s 


time for block search 


46s 



Table 4. 6 suits, 13 cards/suit, the hashtable has 8 

million entries. 



6. Perspectives 

We conclude the paper by providing two perspectives. In Subsection 6.1 
possibilities for generalization are given. In Subsection 6.2 dependency-based 
search is compared to block search. 



6.1 Possibilities for Generalization 

The general idea of the method does not rely much on the domain of Gaps. 
Our notion of a block can in principle find equivalents in many domains, pro- 
vided that we generalize it a little. Until now we have worked with blocks 
that are products of independent sequences; as a first generalization, we should 
define blocks as products of independent graphs. In most domains there will 
be parts of the problem that will be, at least locally, relatively independent. 

To apply the method, we must define what a subgame is, by stating which 
moves belong to which subgame, and we must analyse precisely all kind of 
interactions that could occur between them. This analysis is difficult and is 
domain-dependent, but then the rest is similar to what we did in Gaps: build and 
extend subgraphs in each subgame only as long as they keep being independent. 
The product of those graphs gives a block. Then we build new blocks at the 
boundary of this block and search them recursively. 

Therefore we claim that the idea of decomposing the search space in blocks 
is a natural way to simplify the search space and may be applicable to other 
domains. Furthermore, the method could be much more powerful in domains 
with more independence between subproblems, leading to the construction of 
much larger blocks. 
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6.2 Comparison between Dependency-based Search and 
Block Search 

As a first application domain to dependency-based search, Allis (1994) con- 
sidered the double letter problem. In this domain, a state consists in a word 
on the set of 5 letters {a, 6, c, gJ, e}. Any double occurrence of a letter can be 
replaced by a single instance of its alphabetical predecessor or successor. The 
alphabet is cyclic so ee can be replaced either by d or a, aa by either e or b . The 
winning states are the one-letter words. A detailed application of the method 
had been shown by Allis for the initial state aaccadd. A solution exists (the 
letters that change have been capitalized): 

aaccadd — > Bccadd — ► bBadd — > Aadd — > Edd — » eE — » A 

We are going to compare the way this instance is solved by dependency- 
based search (according to Allis) and a way it could be solved by a block search 
algorithm. Dependency-based search runs with a succession of dependency 
stages and combination stages. After one dependency stage and one combina- 
tion stage, he gets the graph in Figure 8: he considers the 6 moves possible at 
the root and finds that two can be combined together. The same situation can 
be represented with blocks (Figure 9): we have 3 independent subgames corre- 
sponding to the letters at the positions 1, 2, 3, 4 and 6, 7, respectively. In each 
of those subgames two moves can be made from the initial position. Therefore 
the set of positions reachable with these moves can be represented with a cube, 
the initial position aaccadd being in the centre. We then find an interaction at 
one of the edges of the cube: the two “B” that have been created allow to move 
in a new subgame and therefore a new block can be constructed. 




Eccadd Bccadd aaBadd aaDadd aaccaC aaccaE 

\ 1st combination stage 

BBadd 

Figure 8. Dependency-based search, beginning. 

The rest of the search continues similarly with dependency-based search 
(Figure 10) and block search (Figure 11). At least in this example we are really 
doing the same thing with different representations. 

This goes to show that both methods have similarities. However, there are 
some differences that cannot easily be seen on the last example. First we do 
not see all the power of block search here: comparatively to dependency-based 
search, we believe it can deal with interactions of a more complicated nature 
(as in Gaps where we could not apply dependency-based search). Probably we 
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Figure 9. Block search, beginning. 




Eccadd Bccadd aaBadd aaDadd aaccaC aaccaE 



BBadd 




Aadd Cadd / EE 




Bdd Edd ' D A 



Figure 10. Dependency-based search, 
complete. 




Figure 11. Block search, complete. 



do not see all the power of dependency-based search either. For instance and in 
contrary to block search, it is not necessary to provide an explicit decomposition 
in subgames to apply dependency-based search. 

7. Conclusion 

We have presented several search algorithms that take advantage of the par- 
ticularities of the game of Gaps. Our work has resulted in a method, block 
search, which may be applicable in other domains. 

We have shown that iterative sampling produces good results, either for the 
basic variant or the common variant. Conversely, we have shown that the use of 
heuristics is not so promising. Therefore we could deal with only one problem 
in isolation: exploiting the independence between parts of the game. Existing 
methods that deal with this problem were either not applicable to the domain 
of Gaps or were not as precise as ours. 
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Block search is a method to take advantage of a decomposition in subgames 
when there are interactions between those subgames, while keeping the search 
complete. It implies analysing theoretically all types of interactions that can 
occur: how to detect them, how to deal with them by building new blocks at 
the boundary of the current block. Although this analysis relies on domain- 
dependent knowledge, the general idea of the method does not. Experimental 
results have shown that large gains in speed over a depth-first search can be 
expected, depending on the average size of the blocks we are able to build. 
Specifically, the method can be used to solve positions of the basic variant of 
Gaps with more cards. Because the method simplifies the search space, it also 
makes better use of a transposition table. 
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Abstract Kotani (2002) determined the part of the state space of the Japanese Oshi-Zumo 
game in which pure strategies suffice to win. This paper completes the analysis 
by computing and discussing a Nash-optimal mixed strategy for this game. 

Keywords: Nash-optimal strategy, Oshi-Zumo, two-player game 



1. Introduction 

In this article the Japanese game Oshi-Zumo is analyzed. Moves in this game 
consist of simultaneous actions by two players who otherwise have complete 
information about the current game state. In general, such games can be repre- 
sented by a collection of payoff matrix pairs whose entries define the expected 
amount paid to the players in case the respective action pair was chosen. It is 
well known that not knowing the opponent’s action already makes it necessary 
to consider mixed strategies and that so-called Nash-optimal mixed strategies 
exist for any matrix game (Nash, 1950). A simple example is the Rock-Paper- 
Scissors game in which Rock beats Scissors, Scissors beats Paper, and Paper 
in turn beats Rock. The Nash-optimal strategy picks each of the actions with 
probability 

In what follows, we first introduce the Oshi-Zumo game. It is more complex 
than Rock-Paper-Scissors, but considerably simpler than other popular incom- 
plete information games such as Poker and Bridge. In fact, we will show how 
to compute a Nash-optimal strategy within seconds on ordinary PC hardware. 
We then highlight interesting properties of a Nash strategy and conclude the 
paper by discussing how the optimal player performs against reasonable, but 
sub-optimal strategies. 

2. The Game 

Oshi-Zumo - meaning “the pushing sumo (wrestler)” - is played by two 
players who both start off with N coins. At the beginning of a game, a sumo 
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oooomoooo 



[50,4,*]-Oshi-Zumo starting position - code (50, 50, 0) 



ooooo®ooo 



Position after move (4,2) - code (46, 48, 1) 



Figure 1. Oshi-Zumo positions and their triple representation. 

wrestler is positioned at the center of a one-dimensional playing field which 
consists of 2 K + 1 locations (see Figure 1). Moves are played by secretly 
choosing a number of coins less or equal to the amount currently available to 
the respective player, but at least M. The bids are then revealed and the highest 
bidder pushes the wrestler one location towards the opponent’s side. If the bids 
are equal, the wrestler does not move. Both bids are deducted and the game 
proceeds until the money runs out or the wrestler is pushed off the playing 
field. The final position of the wrestler determines the winner: if he is located 
at the center, the game result is a draw. Otherwise, the player in whose half 
the wrestler is located loses the game. We call this parameterized game an 
[TV, AT, M] -Oshi-Zumo game. In this paper we only consider the minimal bids 
M — 0 and M — 1 and declare a game over if both bids are 0. As before, the 
winner is determined by the current wrestler position. 



3. Computing a Nash-Optimal Strategy 

Certain Oshi-Zumo positions possess pure winning strategies. For example, 
all positions in which the opponent has no money left and the wrestler position is 
sufficiently advanced can be won by simply bidding one coin for the remainder 
of the game. Kotani (2002) determined all such positions for the standard 
[50, 3, l]-Oshi-Zumo game. The following list specifies some more interesting 
[50, 4, 0] -positions that can be won by the first player with a pure strategy: 

(n, n, 1) : 1 < n < 11 [bid 1] (n, n + 1, 2) : 1 < n < 12 [bid 1] 

(50, n, —4) : 1 < n < 16 [bid n] (49, n, —4) : 1 < n < 16 [bid n] 

All such positions can be computed by dynamic programming for small values 
of TV and K because the size of the state space is only a polynomial (TV + l) 2 x 
(2 K + 3) in the parameters. First, we compute the payoffs Pi for both players 
at the boundary positions: 

Pi (0,0, A;) = -P 2 (0,0,ft) = sign(fc), for - K < k < K 

Pi(n, m, ±(K + 1)) = - P 2 (n , m, ±(K + 1)) = ±1, for 0 < n, m < TV 
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maximize Z such that 

for all M < j < ri 2 : Z < ^ AijXi , 

i—M 

for all M < i < m : Xi > 0, and 

ni 

= 1 

i = M 



minimize Z such that 

n 2 

for all M < i < ni : Z > Aijyj , 

j = M 

for all M < j < ri 2 '• Vj > 0, and 
n 2 

Z » = 1 

j—M 



Figure 2. Linear programs (LPs) for determining Nash-optimal mixed strategies. 



Then we search for positions with pure winning or drawing strategies, or ones 
that lose for sure no matter what. A position is won for player A if there exists 
an action such that for all actions of the opponent the expected payoff for A is 1 . 
Declaring a position drawn or lost requires that all successor position values are 
known. We repeat this process until we do not find any new position values. 

A Nash-optimal strategy can be computed similarly. Starting again with 
assigning values to the boundary positions, we iterate through all positions with 
unknown expected payoff until we find one for which all successor values have 
been established. At this time we make use of the fact that optimal strategies 
| M < i < n\} and {(j, yj) | M < j < 712 } for players Max and 
Min can be found by solving two linear programs (see Figure 2). Max has 
move choices M, . . . n\ and Min has actions M, . . . , n 2 . and yj denote the 

respective action probabilities. Matrix element Aij defines the payment for 
Max if action pair (i,j) is chosen. Because Oshi-Zumo is a zero-sum game, 
Min receives the negated amount. Z denotes the expected payoff for Max. 
This procedure eventually halts and computes the expected payoffs and mixed 
strategies for all positions. 

We decided to not only create a table containing expected payoffs - which 
would be sufficient for computing values for all positions - but also to store 
the move distributions to speed up later game play and move analyses. Only 
one distribution needs to be computed and stored for each position because the 
move distribution for the second player in position (n, m, k ) is identical to that 
of the first player at (m, n, - k ). 

4. Implementation Issues 

In our first implementation we adopted Michel Berkelaar’s open-source soft- 
ware package lpsolve. Unfortunately, the solver ran into numerical problems 
which caused it to either give up on instances or report incorrect solutions. Im- 
plementing efficient LP solvers is by no means easy. In order to overcome the 
numerical problems we decided to replace floating-point by rational arithmetic 
in lpsolve - which turned out to be more complicated than expected. Finally, 
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we took the simpler LP solver code from Press et al. (1992) and combined it 
with GMP - the GNU arbitrary precision arithmetic library - by replacing the 
float/double data types by GMP’s rational number C++ class. Solving LPs 
using rational arithmetic takes much longer than using floating-point values, 
even if the denominators are bounded. In order to speed up the Oshi-Zumo 
solver we therefore implemented a two-phase approach: whenever the fast LP 
solver reported problems or produced inconsistent results, we would start the 
slow solver based on rational arithmetic. We bounded denominators by 10 8 and 
normalized rational numbers whenever this limit was exceeded. Test runs on 
Oshi-Zumo games manageable by the floating-point based solver indicated that 
the results obtained by rational arithmetic only differed by a negligible amount. 
On a notebook PC with a 1-GHz Pentium-Ill CPU, solving the standard [50, 3, 1] 
game takes just 12 seconds. The C++ source code can be downloaded from 
http : //www . cs .ualberta. ca/~mburo/sumo . tgz. 



5. A Nash-Optimal Oshi-Zumo Strategy 

In what follows we concentrate on the [50, 3, 0] and [50, 3, 1] versions of 
the game and highlight interesting properties of their respective Nash-optimal 
strategies. We start by looking at the move distributions for the starting position: 

M — 0 position=(50, 50, 0) value= 0.0 

bids: 0 1 2 3 4 5 6 7 8 9 W~ 

prob: .083 .077 .088 .083 .092 .088 .097 .092 .099 .094 .101 

M — 1 position=(50, 50, 0) value= 0.0 
bids: 1 2 3 4 5 6 7 8 9 

prob: .139 .053 .146 .060 .152 .067 .156 .068 .156 



Apparent is an “odd-even” effect in which higher and lower bid probabilities 
alternate. This probability pattern occurs in many positions. Why it occurs is 
an open question. 

The smallest positions with randomization requirement are (5,2, —3) for 
M — 0 and (6, 3, —3) for M = 1. The move distributions are as follows: 



M = 0 

position = (5, 2, —3) 
valuei = —0.5 
bidi : 1 2 

prob: .5 .5 

bid 2 : 0 2 

prob: .5 .5 



M = 1 

position = (6, 3, —3) 
valuei = —0.5 
bidi: 1 3 

prob: .5 .5 

bicU FT" 

prob: .5 .5 



In 5,271 cases of the 23,409 possible [50, 3, 0]-positions, and in 4,057 cases for 
M — 1, more than one move has to be considered. To illustrate how complex 
the move decision can be, we present two positions with a high number of holes 
in the move distribution: 
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M — 0 position = (17, 34, 3) valuei = 0.047 



bidi: 


0 


2 


3 


4 


5 


6 


7 


8 


10 


12 


14 


17 




prob: 


.482 


.015 


.008 


.017 


.016 


.020 


.022 


.026 


.069 


.087 


.107 


.126 




bicb: 


1 


2 


3 


4 


5 


6 


7 


17 












prob: 


.066 


.065 


.086 


.080 


.082 


.082 


.058 


.476 












M = 


1 position = 


(20, 


32,3) 


1 valuei 


= o.: 


333 










bidi: 


1 


2 


3 


5 


6 


7 


9 


11 


12 


14 


17 


18 


20 


prob: 


.369 


.004 


.038 


.050 


.007 


.038 


.125 


.069 


.048 


.039 


.080 


.030 


.096 


bich: 


2 


3 


4 


5 


6 


9 


10 


11 


12 


20 








prob: 


.136 


.021 


.079 


.029 


.059 


.016 


.188 


.091 


.043 


.333 









Given such complex distributions, the question arises how well human play- 
ers can play Oshi-Zumo. 

6. How Good is Optimal? 

Playing any mixed or pure strategy against a Nash-optimal player results 
in an expected payoff no better than the expected value E of a game between 
two Nash players. In contrast, the expected value of any pure strategy that 
picks actions from the set an optimal strategy considers, is exactly E when 
playing against the Nash-optimal player. This follows from the fact that all 
actions with non-zero probability have the same expected value. Therefore, 
the Nash-optimal solution is far from optimal with respect to exploiting simple 
(pure) strategies, such as playing Rock all the time in a sequence of Rock-Paper- 
Scissors games. In Rock-Paper-Scissors the Nash strategy cannot win anything 
against any other strategy in the long run. However, in more complex games - 
such as Oshi-Zumo or Poker - it can, because not all actions have non-zero 
probability in all situations. 

A player who just memorizes one move from a Nash-optimal strategy for 
each position does not lose money against a Nash-player in the long run. How 
much does a player lose who occasionally plays moves not played by a Nash- 
player and how well do simple hand-crafted strategies play? To answer these 
questions we wrote a program that played a large number of games between a 
Nash-optimal strategy and several simple move selection algorithms. Figure 3 
presents the tournament results. As expected, the completely random player 
loses almost every game. The player that randomly chooses bids in the interval 
formed by the minimal and maximum Nash bid performs much better and loses 
only about 0.035 units per game for M — 0 and 0.01 for M = 1. Simply 
choosing moves in a small fixed interval also leads to good results and shows 
how easy it is to look good against a Nash player. Also some fairly simple pure 
strategies perform surprisingly well. 

A more interesting question is therefore how to adapt to players and exploit 
their weaknesses while minimizing the risk of being exploited. We think that 




366 



M. Buro 



M = 0 




M = 1 




random 0..# 


-.97882 


random 1..# 


-.98216 


random Nash range 


-.035 


random Nash range 


-.0105 


random l..min(6,#) 


-.31884 


random min(2,#)..min(6,#) 


-.3533 


random l..min(5,#) 


-.16971 


random min(2,#)..min(5,#) 


-.21524 


random l..min(4,#) 


-.05115 


random min(2,#)..min(4,#) 


-.05683 


random l..min(3,#) 


-.00292 


random min(2,#)..min(3,#) 


-.00372 


random l..min(2,#) 


+.000645 


if #>2 2 else 1 


+.00039 


1 


-.002765 


if #>3 3 elif #>2 2 else 1 


-.02987 


if #>2 2 else 1 


-.00156 







Figure 3. The average payoff of various simple move- selection algorithms playing 200,000 
[50, 3, M] -games against a Nash-optimal strategy. # denotes the current number of coins left for 
the heuristic player. 

using games simpler than say Poker but harder than Rock-Paper-Scissors as 
test domains can shed light into this interesting problem, which appears to be 
the last remaining hurdle on the way to Poker programs stronger than human 
players (Billings et al., 2003). Oshi-Zumo is a suitable candidate because its 
Nash-optimal strategy is non-trivial, but can be computed quickly. 
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Abstract We define an infinite class of 2-pile subtraction games, where the amount that 
can be subtracted from both piles simultaneously, is a function / of the size of the 
piles. Wythoff ’s game is a special case. For each game, the 2nd player winning 
positions are a pair of complementary sequences, some of which are related to 
well-known sequences, but most are new. The main result is a theorem giving 
necessary and sufficient conditions on / so that the sequences are 2nd player 
winning positions. Sample games are presented, strategy complexity questions 
are discussed, and possible further studies are indicated. 

Keywords: 2-pile subtraction games, complexity of games, integer sequences 

1. Introduction 

Dominican International Forwarding is the finest international transportation 
company, according to its web site (in Spanish) at http://www.dif.com.do . (An 
optional Google rendition confirms once again that mechanical translation is 
still in its infancy.) What is the connection of DIF to games? 

While pondering this question, let us introduce our first game: 

Gi from dif.com 

Given two piles of tokens ( x , y) of sizes x, y, with 0 < x < y < oo. Two 
players alternate removing tokens from the piles. 

(a) Remove any positive number of tokens from a single pile, possibly the 
entire pile. 

(b) Remove a positive number of tokens from each pile, say k, l, so that 
| k — 1\ is not too large with respect to the position (x \ , y\ ) moved to from 
(x 0 , yo), namely, \k-£\ <xi + l (xi < yi). 

The player making the move after which both piles are empty (a leaf of the 
game), wins; the opponent loses. Thus, (11, 15) -> (3, 4) or to (2, 4) are legal 
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n 


0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


A n 


0 


1 


3 


4 


5 


7 


8 


9 


10 


12 


13 


14 


15 


16 


18 


19 


20 


B n 


0 


2 


6 


11 


17 


25 


34 


44 


55 


68 


82 


97 


113 


130 


149 


169 


190 



Table 1. The first few P - positions for Gi. 

moves, but (11, 15) — » (2, 3) or to (0, 3) are not. The position (0, 0) is the only 
leaf of this and our following games. 

For any acyclic combinatorial game without ties, such as Gi, a position 
u = (x, y ) is labeled N ( Next player win) if the player moving from u can win; 
otherwise it is a P- position (. Previous player win). Denote by V the set of all 
P-positions, by A f the set of all IV-positions, and by F(u) the set of all (direct) 
followers or options of u. It is easy to see that for any acyclic game, 



ueV if and only if F(w) C JV* , (1) 

u E Af if and only if F{u) H V ^ 0 . ( 2 ) 

Indeed, player I, beginning from an iV -position, will move to a P- position, 
which exists by (2), and player II has no choice but to go to an IV-position, by 
(1). Since the game is finite and acyclic, player I will eventually win by moving 
to a leaf, which is clearly a P- position. 

Let S C Z>o, S 7 ^ Z>o, and S = Z>o \ S. The minimum excluded value of 
S is 

mex S = min S = least nonnegative integer not in S. 

Note that mex of the empty set is 0. 

Table 1 portrays the first few P - positions ( A n ,B n ) of Gi. The reader is 
encouraged to verify that the first few entries of the table are indeed P-positions 
of the game. For a technical reason we put P_i = — 1. In Section 4 we prove, 
as a simple corollary to a considerably more general result, 

Theorem 1. For G\, V — Bi)> where, for all n G Z>o, 

A n = mex{i4i, P; : 0 < i < n}, (3) 

B n = B n ~ i + A n + 1. (4) 

The game Gi is a special case of the following new family of combinatorial 
games defined on two piles of finitely many tokens, with two types of moves: a 
move of type (a), and a more general move of type (b), namely, \k — £\ depends 
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n 


0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


15 


16 


A n 


0 


1 


3 


4 


5 


7 


9 


11 


12 


13 


15 


16 


17 


19 


20 


21 


23 


B n 


0 


2 


6 


8 


10 


14 


18 


22 


24 


26 


30 


32 


34 


38 


40 


42 


46 



Table 2. The first few P-positions for G 2 . 




Table 3. The first few P-positions for G 3 . 

on the present and next position. Denote the present position by (xo, yo) and 
the position moved to by (xi, yi). We then require, 

K2/0 - Vi) - Oo - «i)| = K2/0 - x 0 ) - (yi - Xi)| < f(x 1 ,y 1 ,x 0 ), ( 5 ) 

where / is a real constraint function depending on x\ , y\ , x$. If also (yo — xf) > 
(yi — xi), then the requirement becomes yo < /(^l, yi, #o) + yi — x\ + xq. 
The type (b) move defined for Gi is the special case / = x\ + 1. Here are 
descriptions of two additional games. 

G 2 from even.com 

Same as Gi, except that in (b), \k — £\ < x\ + 1 is replaced by \k — l\ < 
x 0 - £1. 

G 3 from dif.com 

Same as Gi, except that in (b), \k — t\ < x\ + 1 is replaced by \k — £\ < 
yi - xi + 1. 

The first few P-positions for G 2 and G 3 are listed in Tables 2 and 3 respec- 
tively. In Section 4 we also prove, 

Theorem 2. For G2 and G3, V = U ^(Ai, Bi), where, for all n G Z>o, 
A n is given by ( 3 ), and B n — 2A n for G2; and Bo — 0 , and for n G Z>i, 
B n = A n + 2 n -lfor G 3 . 
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Each of our games is associated with a pair of complementary sequences A n , 
B n . A special case is the well-known (classical) Wythoff (1907) game . See 
also Berlekamp, Conway, and Guy (1982), Blass and Fraenkel (1990), Con- 
nell (1959), Coxeter (1953), Dress (1999), Fraenkel (1982), Fraenkel (1984), 
Fraenkel and Borosh (1973), Fraenkel and Ozery (1998), Landman (2002), Sil- 
ber (1976), Silber (1977), and Yaglom and Yaglom (1967). In fact, the classical 
Wythoff game is the case /(xi, j/i,x 0 ) = 1, whereas the generalization con- 
sidered in Fraenkel (1982) is the case /(xi, xo) = t for any fixed t £ Z>q. 
Whereas the winning strategy of Wythoff’s game is associated with sequences 
related to algebraic integers of the form (2 — t + Vt 2 + 4)/2 (t=l is the golden 
section), our games give rise to an infinity of sequences, some well-known, but 
mostly new ones. 

In Section 2 we shall see that the pair of sequences of P-positions associated 
with Gi is related to a “self-generating” sequence (Sloane, 1999) of Hofstadter. 
In Section 3 we indicate how the P-positions of G 2 are related to another well- 
known sequence. The central result appears in Section 4, where a general 
theorem is formulated and proved, that yields winning strategies for a large 
class of 2-pile subtraction games. Roughly speaking it states that for every 
2-pile subtraction game, if its constraint function / is “positive”, “monotone” 
and “semi-additive”, then it has P- positions A n ,B n , where A n satisfies (3), 
and B n has an explicit form depending on /. In a complementary proposition 
we show that positivity, monotonicity and semi-additivity are also necessary, in 
the sense that if any one of them is dropped, then there are constraint functions 
and their associated games G, such that the positions claimed to be P- positions 
by the central result, are not P- positions for these G. Theorems 1 and 2 are then 
deduced as a simple corollary of the central result. In Section 5 we give a random 
assortment of sample games with their P- positions that can be produced from 
the central theorem. Questions of complexity and related issues are discussed 
in Section 6. The epilogue in Section 7 wraps up with some concluding remarks 
and indications for further study. 

2. The Godel, Escher, Bach Connection 

On p. 73 of Hofstadter ’s (1979) famous book the reader is asked to charac- 
terize the following sequence: 

B' n > 0 = { 1, 3, 7, 12, 18, 26, 35, 45, 56,...}. 

Answer: the sequence {2, 4, 5, 6, 8, 9, 10, 11,...} constitutes the set 
of differences of consecutive terms of B ' n , as well as the complement with 
respect to Z>o of B' n . For our purposes it is convenient to preface 0 to the latter 
sequence, so we define 

K > o = {0, 2 . 4 > 5 > 6 > 8 > 9 - 10 > U.---}. 
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which is the complement with respect to Z>o of D' n . Now A' l0 = mexjA', B' i : 
0 < i < 10} = 13, so B[ 0 = 56 + 13 = 69. We see that in general, for all 
n € Z> 0 , 

A' n = mexjxl', B[ : 0 < i < n} (6) 

which has the form (3), and 

BU = 1. B' n = + A^, (7) 

which is similar to (4). Moreover, the following proposition shows that there 
is a very close relationship between the P-positions of the game Gi and Hof- 
stadter’s sequence B' n , namely, B' n exceeds B n by 1. This can be observed by 
comparing the bottom row of Table 1 with B’ n . 

Proposition 1. A' n = A n + 1 (n > 1), B f n = B n + 1 (n > 0), where A' n , 
B' n are given by (6), (7) respectively, and A n , B n by (3), (4) respectively. 

Proof. We see that the assertions are true for small n. Suppose they hold for 
al li < n. Then 

A' nJrl = mex{A', B[ : 0 < i < n} = mex{0, Ai + 1, Bi + 1 : 0 < i < n}. 

Put S' n = {0, Ai + 1, Bi + 1 : 0 < i < n}, S n = {Ai,Bi : 0 < i < n}. 
If, say, the integer interval [0, k ] is in S n for some k G Z>o and k + 1 ^ S n , 
then k + 1 G S f n and k + 2 0 S' n . It follows that mex S' n = A n + 1 + 1. Also, 
B f n+ i = B f n + A ; n+1 = B n + 1 + A n + 1 + 1 = B n +i + 1. ■ 

Thus the P-positions of G\ constitute a “translation by 1” of the Hofstadter 
sequence, that is, P n +i - B n — A n+ i + 1. So A n + 1 is the difference (dif) 
and A n the complement (com) of B n : they are products of dif.com. 

3. Prouhet-Thue-Morse 

It is not hard to see that the sequence A n (n > 1) of G 2 contains precisely 
all positive integers whose binary representation ends in even number of zeros. 
(Because of this, G 2 originates from even.com: “www.even.com is the best 
place to find information and sources for even”, it says on its webpage.) The 
sequence A n is also lexicographically minimal with respect to the property that 
the parity of number of l’s in the binary expansion alternates. Furthermore, 
it is lexicographically minimal with respect to the property that the sequence 
is the double of its complement. If m appears in A n , then 2m appears in 
B n . In particular, B n contains precisely all positive integers whose binary 
representation ends in an odd number of zeros (Carlitz, Scoville, and Hoggatt, 
1972). The sequence 

C n = Q^l~^-0^2-^lQ^3-^.2 Q^2n+1— Mn ]^2n+2 “ ^2n+l 

= 011010011001011010010 ..., 
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is the Prouhet-Thue-Morse sequence, which arises in many different areas of 
mathematics. See the charming paper (Allouche and Shallit, 1999 ) , which also 
contains A n , for many further properties of these sequences. 

4 . A Master Theorem 

The three previously described games Gi, G2, G3, are special cases of an 
infinite family of games that we now formulate. We shall then provide a general 
winning strategy for this family of games and prove its validity. 

General 2-pile subtraction games 

Given two piles of tokens (x, y) of sizes x, y, with 0 < x < y < 00, whose 
P- positions are V — U^ 0 (A;,£^). Two players alternate removing tokens 
from the piles: 

(aa) Remove any positive number of tokens from a single pile, possibly the 
entire pile. 

(bb) Remove a positive number of tokens from each pile, say so that 
| k — £\ is not too large with respect to the position (xi,yi) moved to from 
(zo, Vo ), namely, \k - t\ < f(xi,yi, x 0 ), equivalently: 

I (yo - Vi) ~ Oo - xi)\ = \(yo ~ x 0 ) - (yi - xi)| < f(x^ y u x 0 ), (8) 

where the constraint function /(xi, yi, xo) is integer- valued and satisfies: 

■ Positivity: 



/(zi,2/i,z 0 ) > 0 Vyi > xi > 0 Vx 0 > xi. 

■ Monotonicity: 



x ' 0 <x 0 ^ f(xi,yi,x' 0 ) < f(xi,yi,x 0 ). 

■ Semi-additivity (or generalized triangle inequality) on the P-positions, 
namely: forn > m > 0, 

m 

^ ] / (A n — 1 — ii -Bn-l-i? A-n—j) P f (A. n — m— 1 5 ^n— m— 1 5 A n ) . 
i = 0 

The player making the move after which both piles are empty wins; the opponent 
loses. 

In view of (8), positivity is a natural condition. Without positivity, a move of 
type (bb) is not even possible. Monotonicity appears to be a minimal require- 
ment to enforce positivity. Semi-additivity is a convenient condition to have, 
and many functions are semi-additive. Note that Gi, G2, G3 clearly satisfy 
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positivity and monotonicity; Gi and G3, in whose functions / there is no A n , 
are clearly semi-additive; and G2 is semi-additive with equality. (See also the 
proof of Theorems 1 and 2 at the end of this section.) 

Theorem 3. Let S = U?2 0 (Ai, Bf), where, for all n G Z>o, A n is given by 
(3), Bo = 0, and for all n G Z>o, 

B n = f {A n - 1, B n -i,A n ) + B n - 1 — A n - 1 + A n . (9) 

If f is positive, monotone and semi-additive, then S is the set of P -positions of 
a general 2-pile subtraction game with constraint function /. 

Proof. The definition of A n implies directly, 

A n > A n - 1 (10) 

for all n G Z>o. From (9) we have, for all n G Z>o, 

Pji B n — 1 = f (A n _i, B n —i , An) + A n A n -- 1 , (11) 

Pn A n = f (^n— 1 5 1 > ^-n) H - Pn— 1 A n _i . (12) 

Now /(Ao, 5 0 , Ai) > 0 by positivity, so i?i — i?o > 2 by (10), (11). Hence 
we get, by induction on n, 



B n — P m > 2 for all n > m > 0. 



(13) 



Similarly we get from (12), 



B n — A n > P m ~ A m > 0 for all n > m > 0. (14) 

Let A = Uj^A^ and B = U^^Bn- Then A and B are complementary sets of 
integers, i.e., AUB = Z>i (by (3)), and A Hi? = 0. Indeed, if A n = B m , then 
n > m implies that A n is the mex of a set containing B m — A n , a contradiction 
to the mex definition; and 1 < n < m is impossible since 

Pm = f(A m— 1 5 Pm— l j A m ) + B m — 1 A m _i + A m 
^ f ( A-m—l > Pm— 1 5 ) “I" 1 1 "F A n 

(by (10), (14) and monotonicity) 

> A n (by positivity). 

Since B n — B n - 1 > 2 for all n > 1 by (13), and since A and B are comple- 
mentary, 



A n A n _ i £ {1,2} 



(15) 
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for all n E Z>o- Denote by V 1 the set of all positions (A n , B n ) satisfying (3) 
and (9), and let A/ 7 = Z> 0 \ V f . For showing that V' = V and A/ 7 = AT, it 
evidently suffices to show two things: 

I. Every move from any (A n , B n ) E V 1 results in a position in the com- 
plement A/ 7 . 

II. From every position (x, y ) in the complement A/ 7 , there is a move to 
some (A n ,B n ) E V'. 

(It is useful to note that these two conditions are also necessary: (1) implies 
that all positions reachable in one move from a P-position are Appositions; 
whereas (2) shows that at least one P-position is reachable in one move from 
an Apposition.) 

I. A move of type (aa) from (A n) B n ) E V' has the form (x, B n ) or (A n , y) 
(x < A n , y < B n ). Both are in A / 7 since the sequences A n , B n are strictly 
increasing. Suppose there is a move of type (bb): (PL n , B n ) — > (Aj,Bj) E P ; . 
Then j < n. Note that 



I (B n Bj ) (A n Aj) | 

= \{B n - A n ) - (Bj - Aj ) | = (B n - An) - (Bj - Aj) 



by (14). By iterating (9) we have, 



(B n - An) - (Bj - Aj) 

= f (A n — 1 5 B n — i , A n ) + (B n — i A n — i) — (Bj — Aj) 
— f (A n — 1 5 B n — i , A n ) + f (A n —2, B n — 2 , A n —\) 

+ (B n ~2 — A n -2) — (Bj — Aj) 

n-j-l 

~ ] f (A n — i — i ? B n — i — i , A n _ ^ f(Aj)Bj)A n )) 

i = o 

where the inequality follows from semi-additivity. Thus 

|(P n - P^-) - (An - Aj ) I > f(Aj, Bj,A n ), 



contradicting condition (bb). 

II. Let (x,y) e J\f f (0 < x < y). Since A and B are complementary, every 
n E Z>o appears exactly once in exactly one of A and B. Therefore we have 
either x — B n or else x = A n for some n > 0. 

(i) a; = B n . Then move y — > A n . This is always possible since if n = 0, 
then y > Aq = Bq\ whereas A n < B n for n > 1 by (14). 
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(ii) x = A n . If y > B n , move y —» So suppose that A n < y < B n . 
Then n > 1. For any me {0, . . . , n - 1} we have by (9) and by monotonicity, 

(Pm+i A rfi j r \ ) (B rn A rn ) — f (A m , Bjji 5 ^4m+i ) 

+ f (A m ^ B m , A n ) . 

Thus B m - A m + /(A m , B m , A n ) > B m +i - Therefore the intervals 

[B m -Am.Bm-Am + f (A m ,B m ,A n )) (closed on the left, open on the right) 
cover Z>o for m > 0. Hence 

V A n e [B m A m , B m A m + f (A. m , B m , -A n )) (16) 

for a smallest m e {0, . . . , n — 1}. We then move (x, y) (-A m , -B m ). This 
move is legal, since: 

■ m < n. Indeed, y - A n < B n - A n = /(A n _i, B n -i, A n ) + P n -i - 
A n - 1. Thus m < n — 1 by (16). 

■ y > B m . By (16), y A n + B rn A m . Hence y B m + A n A m + 0. 

■ The move satisfies (bb): 

I (y ~ B m ) — (x -Am) I = \{y ~ A n ) (B m — A m ) | 

= (y ~ A n ) (Bm ~ A m ) 

where the last equality follows from (16) and our choice of ra. We 
thus have \(y - A n ) - (B m ~ A m )\ = (y ~ A n ) - (B m ~ A m ) < 
f(A m ,B m ,A n ) by (16). ■ 



In a sense, Theorem 3 is best possible. This is enunciated below. 

Proposition 2. There exist 2-pile subtraction games with constraint functions 
f which lack precisely one of positivity ; monotonicity or semi-additivity, such 
that S V, where S = U^ 0 (A.^, Bf), and A{ satisfies (3) (i G Z>o); Bo = 0, 
Bi satisfies (9) (i G Z>o). 



Proof. Consider the function /(xi, yi, xo) = (xo — xi) 2 - It is clearly positive 
and monotone. However, (A n - A n -\ ) 2 + (A n -i - A n ~ 2 ) 2 < (A n ~ A n ~ 2 ) 2 , 
no matter whether A n — A n -\ = A n -\ — A n -2 = 1 or otherwise, so / is not 
semi-additive. From (9) we get, B n = P n _ 1 + ( A n - A n -i)(A n - A n -i + 1), 
where A n satisfies (3). The first few values of (A n , B n ) are depicted in Table 4. 
Note that these are not P-positions: we can move (A n ,B n ) — > (Ai^Bf) in 
many ways; e.g., (4, 10) — » (0, 0) satisfies (bb). 

The function /(xi, yi, xq) = |_(+i + 1 )/xqJ + 1 is positive. Since 



A n - 1 + 1 
A n 



+ 1 



+ 



-A 72— 2 + 1 

-An-1 



+ 1 > 



-A n -2 + f 

An 



+ 1 — 1 , 
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n 
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2 


3 


4 5 


6 


7 8 


9 


10 


11 


12 


13 


14 


15 16 


An 


0 


1 


3 


4 


5 6 


7 


9 11 


13 


15 


17 


18 


19 


20 


21 


23 


B n 


0 


2 


8 


10 


12 14 


16 


22 28 


34 


40 


46 


48 


50 


52 


54 60 


Table 4. The first few values of S for / 


= (xo - 


xi) 2 
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4 5 
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7 8 
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10 
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14 


15 


16 


A n 


0 


1 


2 


4 


5 7 


8 10 11 


13 


14 


16 


17 


19 


20 


22 


23 


B n 


0 


3 


6 


9 


12 15 


18 21 24 


27 


30 


33 


36 


39 


42 


45 


48 


Table 5. The first few values of S for / 


= [(X! + 1)/X 0 \ + 1. 












n 


0 


1 


2 


3 


4 5 6 


7 


8 9 


10 


11 


12 


13 


14 


15 


16 




A n 


0 


1 


2 


4 


5 6 8 


9 


10 11 


14 


15 


16 


17 


18 


19 


20 




B n 


0 


1 


3 


7 12 13 21 


30 


31 42 


45 


60 


61 


78 


79 


98 


99 





Table 6. The first few values of P-positions for / = (l + (— 1) !,1+1 ) xi/2. 



it is also semi-additive. But it is not monotone. From (9), B n = B n -i — A n ^\ + 
A n + \{A n -i + l)/A n \ + 1. The first few values of S = U^ =0 (A n , B n ) are 
shown in Table 5. The game-position (4, 7) ^ S, but it cannot be moved into 
S. Hence S ^ V. (Incidentally, note that the sequence B n consists of all 
nonnegative multiples of 3.) 

Lastly, consider f(x\, yi, xo) = (1 + (-l) 271 " 1-1 ) xi/2. We see easily that 
/ is semi-additive, and it is trivially monotone. But whenever y\ is even, 
/ is not positive. We have, B n = A n + B n _i — ^1 + (— l) 5 "- 1 j A n -i/2. 
Table 6 shows the first few 5-positions. These are not P-positions: The position 
(10, 29) ^ <S, cannot be moved into any position in S. ■ 

Proof of Theorems 1 and 2. The function f(xi,yi,xo) = x\ + 1 is clearly 
positive. Monotonicity is satisfied trivially. It is also clear that / is semi- 
additive. The function f(x\,yi,xo) = xq — x\ is positive, since xo> x\. It’s 
also monotone. Since (A n+1 - A n ) + ( A n - ^4 n _i) = A n+ \ - A n - 1 , we 
see that / is semi-additive. Finally, the function f{xi,yi,xo) = y-\ — x\ + 1 
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3 4 
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10 


11 


13 


14 


15 


16 




17 


19 


20 


B n 


0 2 


7 12 


18 


25 35 45 


56 


68 


83 


98 


114 


131 


149 


170 


191 


Table 


7. The 


first few 


values of S for / 


= xi 


— l(xi + 1)/xqJ + 2. 
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0 1 


2 3 
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5 6 
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12 


13 


14 


15 


16 


A n 


0 1 


2 3 


5 


6 7 


9 


10 


11 


13 


14 


15 


16 


17 


19 


20 


B n 


0 4 


8 12 


18 


22 26 


32 


36 


40 


46 


50 


54 


58 


62 


68 


72 



Table 8. The first few values of S for / = xq — x\ + 2. 



is positive for all x\ < y\ and is trivially monotone. It is also semi-additive. 
Thus by Theorem 3 we have for Gi, B n = A n -i + 1 + B n -\ — A n -i +A n = 
B n - 1 + A n + 1, as stated in Theorem 1. For G2, (9) implies, B n — A n — 
Aji — 1 + B n —\ A n — 1 + A n — 2A n 2A n -i -\- B n —\ = 2A ni where the last 
equality follows by induction on n. For G3, B n = P n _ 1 — A n _i + 1 + B n -\ — 
A n -i + A n = 2(B n _i - A n -i) + A n + 1 = A n + 2 n - 1. Again the last 
equality follows by induction. ■ 

5. Further Sample Games 

For the examples below, we leave it to the reader to verify positivity, mono- 
tonicity and semi-additivity of /. Some of these examples are elaborated on in 
the next two sections. 

Example 1. f(xi,yi,x 0 ) = x\ - [(xi + l)/x Q \ + 2. Then B n = 5 n _i + 
A n — [(Ai— 1 + 1) /A n \ + 2. The first few P-positions are depicted in Table 7. 

Example 2. f(xi,yi,x 0 ) = x 0 -xi+2. ThenP n = B n ^ 1 +2(A n -A n ^ 1 +l). 
See Table 8 for the first few P-positions. 

Example 3. f{xi,yi,x Q ) = (-l) yi -(-l) Xl +3. ThenP n = B n - 1 ~A n ~ 1 + 
A n + (— l)- 8 "- 1 — (—l)" 4 "- 1 + 3. See Table 9 for the first few P-positions. 

Example 4. f(xi,yi,xo) = x\ (1 + ( — l)^ 1 ) + 1. This leads to B n = P n _i + 
(—l) An ~ 1 A n -i + A n + 1. Table 10 exhibits the first few P-positions. 
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175 
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Table 


10. The first few values 


of 5 


for/ 


= ; 


Xi (1 - 


H- 


in 


-hi. 











6. Computational Complexity Issues 

What is the computational complexity of computing the winning strategy 
for our games? Given a position (x, y) with 0 < x < y < oo, the statement 
of Theorem 3 enables us to compute the table of P- positions. It suffices to 
compute it up to the smallest n = no such that A no > x, and thus determine 
whether (x, y) € V or in J\T . The proof of Theorem 3 then enables us, if 
(x,y) G Af , to make a winning move to a position in V. The latter part of the 
strategy, that of making a winning move, is clearly polynomial. The first part, 
determining whether or not (x, y) G V is linear in x, since A no < 2x by (15). 

Our games, however, are succinct , i.e., the input size is f2(log x) rather than 
Sl{x) (assuming that y is bounded by a polynomial in x). Thus their complexity 
is not obvious a priori. Even if the F? n -sequence grows exponentially, polyno- 
miality of the strategy does not necessarily follow. For example, I do not know 
whether the sequence B n of G3 can be computed polynomially. 

Special sequences are known to be computable polynomially. For example, 
consider the numeration system with bases defined by the recurrence u n = 
(s + t — l)u n -i + su n - 2 (n > 1), where s,f G Z>o, with initial conditions 
u - 1 = 1/s, uq = 1. It follows from Fraenkel (1985) that every positive 
integer N has a unique representation of the form N = J2i>o di u i> with digits 
di G {0, . . . , 5 + 1 — 1}, such that di + 1 = s + t — 1 => di < s for all i G Z> 0 . 
The representation of the first few entries for the special case s = 2, t = 2, is 
depicted in Table 10. 

If we compare Table 11 with Table 8, we might note the following two 
properties: 
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Table 1 1 . Representation of the first few integers in a special numeration system. 

■ All the A n have representations ending in an even number of Os, and all 
the B n have representations ending in an odd number of Os. 

■ For every (A n , B n ) E V, the representation of B n is the “left shift” of 
the representation of A n . 

Thus (1, 4) of Table 11 has representation (1, 10), and (6, 22) has represen- 
tation (12, 120): 10 is the “left shift” of 1, 120 the left shift of 12. 

These properties hold, in fact, in general for Example 2, which is a special 
case of another family of sequences and games analyzed in Fraenkel (1998). 
They enable one to win in polynomial time for that family. 

However, we do not even know whether there are NP-hard sequences. A case 
in point is the infinite family of octal games (Guy and Smith, 1956; Berlekamp, 
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Conway, and Guy, 1982, ch. 4), even for the subfamily where there are only 
finitely many nonzero octal digits. Some octal games have been shown to 
have polynomial strategies (see, e.g., Gangolli and Plambeck, 1989) but the 
complexity of most is unknown. 

We mention very briefly other relevant complexities. They include Kol- 
mogorov complexity, subword complexity, palindrome complexity, and, we 
might add, squares complexity. The subword complexity c(n) of a sequence S 
is the number of distinct words of length n occurring in S. In Allouche et al. 
(2003), this notion is attributed to Ehrenfeucht, Lee, and Rozenberg (1975). 
Surveys can be found in Allouche (1994), Ferenczi (1999), Ferenczi and Kasa 
(1999). The palindrome complexity p(n ) of S is the number of distinct palin- 
dromes of length n in 5. See, e.g., Damanik and Zare (2000), and Allouche 
and Shallit (2003). Define the squares complexity s(n ) of S as the number 
of distinct squares of length n in S. Thus the result of Fraenkel and Simpson 
(1995) implies that there are binary sequences for which s (2) = 2, s(4) = 1, 
s(2k) = 0 for all k > 2. There is also the notion of program complexity (Daley, 
1973, 1974, 1975) concerning the complexity of computing a sequence, which 
is related to Kolmogorov (1968) complexity. 

7. Epilogue 

We have defined an infinite class of 2-pile subtraction games with two types 
of moves: (aaa) remove any positive number from a single pile; (bbb) remove 
k > 0 from one pile, t > 0 from the other. This move is restricted by the 
requirement \k — £\ < /, where / is a positive real-valued function. We 
have shown that a pair A n , B n of judiciously chosen complementary sequences 
constitutes the set of P -positions if and only if / is monotone and semi-additive. 

As we have pointed out, the generalized Wythoff game (Fraenkel, 1982) is 
a special case of the family of games considered here. It has the property that 
a polynomial strategy can be given by using a natural numeration system, and 
noting that the A n members are characterized by ending in an even number of 
0s in that representation, and the B n being their left shifts. A similar situation 
exists for G 2 , but with the standard binary representation as numeration system. 
With the game in Example 2, an essentially different numeration system (see 
Fraenkel, 1998) can be associated to the same effect. 

Further studies 

1 . With which games can we associate an appropriate numeration system so 
as to establish a polynomial strategy? 

2. Extend the games in a natural way to handle more than two piles. For 
Wythoff ’s game, I have a conjecture (see Guy and Nowakowski, 2002, Prob- 
lem 53; Fraenkel, 2003, Section 5). 
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3. Compute the Sprague-Grundy function g for the games, which will enable 
to play sums of games. For Wythoff’s game this is an as yet unsolved problem, 
though eventual additive periodicity has been proved (Dress, Flammenkamp, 
and Pink, 1999; Landman, 2002). 

4. Compute a strategy for the games when played in misere version, i.e., 
the player making the last move loses. This is easy for Wythoff’s game (see 
Berlekamp, Conway, and Guy, 1982, ch. 13). 

5. We have already mentioned the question of the polynomiality of the 
strategy. Is there a 2-pile subtraction game that is Pspace-complete? 

6. Computation of complexities of P - positions sequences, such as Kolmo- 
gorov-, program-, subword-, palindrome-, squares-complexities. For the A n - 
sequence of Example 2, the subword complexity was computed in Fraenkel, 
Seeman, and Simpson (2001). 

7. Make an about-face: begin with pairs of known complementary se- 
quences, and design matching 2-pile subtraction games. 
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